Do news access pathways moderate filter bubbles? Evidence from a longitudinal analysis of data donations

Abstract

News access pathways shape the informational diversity of users’ news repertoires, with distributed modes such as search and social media generally providing broader exposure than direct navigation. Yet whether these differences produce filter bubble dynamics—gradual narrowing of diversity over time—remains underevaluated. We provide a direct test using donated browsing histories from n = 179 users in the Netherlands spanning approximately 1 year, combining mixed-effects models with a within-between decomposition and a content-level diversity measure capturing semantic differences between consumed articles. Within-person estimates confirm that distributed pathways enhance both outlet and topic diversity relative to direct access. For outlet diversity, search and social media show moderate convergence effects, with their diversity contributions partially eroding over sustained use; topic diversity trajectories show no pathway moderation. The within-between decomposition further reveals that aggregate pathway–diversity associations can mask opposing selection dynamics, underscoring the need for temporally granular, within-person designs when evaluating filter bubble claims.

Keywords

filter bubbles news pathways digital journalism content diversity data donations

Introduction

The pathways audiences use to reach news—direct access, search engines, social media, email, or messaging—increasingly shape online news consumption (Fletcher et al., 2023). Each pathway offers distinct affordances and curation processes that influence which stories audiences encounter (Wojcieszak et al., 2022). Understanding how these differences affect consumption diversity has become central to digital journalism scholarship.

Despite extensive research on news pathways, direct empirical evidence on whether pathways contribute to filter bubble dynamics—the gradual narrowing of diversity over time—remains limited. As Michiels et al. (2022) note, definitional ambiguity has produced studies with temporal models poorly suited to detect dynamic filtering processes. Most digital trace studies either aggregate consumption records or employ cross-sectional or small-T designs, lacking the temporal sensitivity to measure filter bubble effects. This leaves us with limited empirical basis to evaluate whether filter bubbles emerge in naturalistic online news consumption and how pathways might differentially shape these processes.

We address this gap by providing a direct test of pathway-moderated filter bubble dynamics using granular longitudinal measurement of article selection within donated web browsing histories. We extend existing research through a novel application of Stirling’s diversity framework (2007), which measures diversity across three dimensions: variety (number of different categories), balance (distribution evenness), and disparity (differences between categories). We integrate this framework with text embeddings to measure semantic content dynamics across individual article selection histories spanning approximately 1 year. Where previous studies measured diversity through categorical classifications (e.g. outlets or topics), our approach analyzes the actual textual content of consumed articles—capturing what users read with each selection, rather than simply how many sources they visit or which topical categories their selections represent. This addresses a key limitation of prior work that treats all within-category exposures as equivalent, when outlets or topics vary considerably in semantic composition.

We define filter bubbles operationally as decreases in news consumption diversity over time, whether driven by algorithmic filtering, social network effects, or individual selection behaviors. This definition allows us to test two questions:

Research Question 1 (RQ1): Do different access pathways contribute different levels of content diversity to users’ news diets?

Research Question 2 (RQ2): Do different access pathways moderate diversity trajectories, systematically shifting users toward greater or lesser diversity over time?

Our approach offers three innovations: (1) a direct test of pathway-moderated filter bubble formation using continuous measurement across users’ multi-pathway consumption; (2) analysis of donated browsing histories spanning a full year of naturalistic usage; and (3) application of Stirling’s framework with semantic embeddings to capture not just categorical but semantic disparity between consumed content. This methodology sheds new light on whether filter bubbles form in real-world news consumption and how access pathways shape information diversity over time.

Background

Filter bubbles

The term “filter bubble” (Pariser, 2011) typically refers to the phenomenon where personalized algorithms reinforce pre-existing preferences, progressively reducing the diversity of information to which audiences are exposed. Scholars have feared that such filtering could isolate citizens into separate information environments, depriving them of the shared informational basis necessary for informed civic engagement and deliberation (Habermas, 2010).

The reality appears more nuanced than early deterministic portrayals suggested. Where initial theorizing envisioned deep isolation through reinforcing filtering cycles, recent scholarship emphasizes that multiple factors drive news consumption and various mechanisms exist to break such cycles (Dahlgren, 2021; Trilling, 2024; Zuiderveen Borgesius et al., 2016). Empirical findings have been mixed (Bruns, 2019; Kitchens et al., 2020; Ross Arguedas et al., 2022). Many studies suggest news diets remain balanced through consumption across multiple channels (Cardenal et al., 2019; Dubois and Blank, 2018; Flaxman et al., 2016; Hartmann et al., 2025; Ross Arguedas et al., 2022), while theoretical contributions highlight the role of individual self-regulation (Slater, 2015) and complex environmental interdependencies (Trilling, 2024) in maintaining diverse exposure.

Understanding filter bubbles requires clear conceptualization, particularly in demarcation from the closely related concept of “echo chambers” (Sunstein, 2017). As Fletcher et al. (2023) note, there remains a persistent “. . . lack of clarity about the precise meaning of these terms, and how they differ from one another (if at all).” One potential distinction rests on mechanism and the resultant locus of empirical concern. Echo chambers emphasize social-epistemic structures where like-minded individuals reinforce each other’s views, primarily through homophilic network emergence (Hartmann et al., 2025). Filter bubbles, by contrast, focus on the processes through which information filtering occurs—whether via algorithmic curation or individual self-selection—and their effects on individual-level exposure patterns.

This distinction matters for examining how access pathways may moderate filtering effects. We adopt the filter bubble concept because we are interested in filtering processes across multiple access routes, rather than resultant patterns of social clustering or belief reinforcement. Building on Michiels et al. (2022), we define a news filter bubble as a decrease in the diversity of a user’s news consumption over time, in any dimension of diversity, resulting from filtering by algorithmic systems, social network effects, and/or individual selection behaviors. This definition acknowledges the variety of filtering mechanisms in online news environments while emphasizing the temporal dynamics that characterize filter bubbles as a gradual process of diversity collapse.

Pathways to news

News pathways function as the primary channels through which news is filtered in digital environments (Wojcieszak et al., 2022). A fundamental distinction lies between direct access (users actively seeking known sources) and distributed pathways (search, social media, email, messaging) that deliver news through intermediary platforms subject to algorithmic or social network mediation (Fletcher et al., 2023). These pathways vary substantially in their affordances and curational dynamics, creating distinct information environments that may intensify or mitigate filter bubble effects (Möller et al., 2020; Smets, 2022; Vermeer et al., 2020). Studies in this area do not consistently distinguish between filter bubble and echo chamber dynamics; we draw on them for their empirical relevance to pathway–diversity relationships, agnostic as to which label is applied.

Direct access, where users intentionally navigate to specific websites, is the pathway most dependent on self-selection, occurring through established routines (Möller et al., 2020) or contextual triggers (Vermeer et al., 2020). Filtering operates primarily through cognitive preferences and habits (Dahlgren, 2021), alongside editorial and algorithmic curation within chosen outlets (Thurman et al., 2019). Direct access appears to maintain topical breadth due to editorial curation (Dubois and Blank, 2018; Hüllmann and Sensmeier, 2022), yet limits ideological diversity as users gravitate toward preference-reinforcing outlets (Cardenal et al., 2019; Flaxman et al., 2016; Wojcieszak et al., 2022). However, users who primarily access news directly often maintain diverse diets when multi-platform consumption is considered (Ross Arguedas et al., 2022).

Search engines introduce different dynamics through query-driven interfaces combined with algorithmic ranking prioritizing relevance, recency, and authority (Trielli and Diakopoulos, 2019), while incorporating user history and location (Le et al., 2019). This creates a hybrid environment where queries define interest domains while algorithms determine specific content encountered (Van Hoof et al., 2022). Some theorize that search counteracts filter bubbles through “automated serendipity”—exposing users to relevant but unsought content (Fletcher and Nielsen, 2018). Empirical work reveals diversity-enhancing effects, with users encountering ideologically congruent and cross-cutting sources (Cardenal et al., 2019; Flaxman et al., 2016; Ross Arguedas et al., 2022; Wojcieszak et al., 2022) and unfamiliar sources beyond habitual patterns (Ulloa and Kacperski, 2024; Wojcieszak et al., 2022). While search provides diverse exposure, engagement can still reflect partisan preferences through selective clicking (Robertson et al., 2023), though clicks usually distribute across multiple outlets (Steiner et al., 2022; Ulloa and Kacperski, 2024).

Social media platforms layer algorithmic curation with social network effects through feed-ranking systems integrating content-based and collaborative filtering (Wu et al., 2023), creating environments where algorithmic and social factors interact (Barberá, 2020; Bechmann and Nielbo, 2018). Empirical work suggests that social media diversifies exposure through counter-attitudinal content (Beam et al., 2018; Flaxman et al., 2016; Lorenz-Spreen et al., 2023; Scharkow et al., 2020; Wojcieszak et al., 2022) and weak ties providing cross-cutting information (Barberá, 2020; Ross Arguedas et al., 2022), while incidental exposure increases source diversity particularly among less engaged users (Fletcher and Nielsen, 2018; Kitchens et al., 2020). Yet engagement-level filtering persists, with users consuming more partisan content (Cinelli et al., 2021; Kitchens et al., 2020; Nikolov et al., 2019), while algorithmic homophily and network clustering create additional filtering effects (Barberá, 2020; Bechmann and Nielbo, 2018), particularly among highly motivated news consumers (Fletcher and Nielsen, 2018; Kitchens et al., 2020). Platform design significantly influences these dynamics (Monti et al., 2023).

Email newsletters and messaging services represent increasingly important yet understudied pathways. Messaging apps facilitate news sharing within private conversations (Kalogeropoulos, 2021), while email enables newsletter subscriptions from preferred sources (Newman et al., 2024), though effects on consumption diversity remain largely unaddressed (Wojcieszak et al., 2022). Notably, email newsletters occupy an ambiguous position between direct and distributed access: while technically mediated through an intermediary platform, they are subscriber-initiated and typically single-outlet, sharing the self-selective character of direct access more than the algorithmic or social curation of other distributed pathways.

The mechanisms through which pathways shape diversity are complex; within any pathway, journalistic, algorithmic, social, and individual curation processes operate simultaneously and interact in ways difficult to disentangle (Jürgens and Stark, 2022). Our study does not attempt to isolate these within-pathway mechanisms. Rather, we treat pathways as aggregate channels and ask whether dependence on them is associated with filter bubble dynamics over time.

Limitations of current evidence

The mixed empirical findings reflect methodological limitations that hinder robust evidence on filter bubble effects: definitional ambiguity, insufficient temporal granularity, and lack of content-level diversity measurement.

Limitation 1: Definitional Clarity—Persistent conceptual ambiguity around what constitutes a filter bubble represents a fundamental barrier to comparable findings (Bruns, 2019; Dahlgren, 2021; Ross Arguedas et al., 2022; Terren and Borge-Bravo, 2021), reflecting both the challenge of operationalizing a metaphorically inflated concept (Trilling, 2024) and the contested normative nature of the phenomenon (Dahlgren, 2021).

This ambiguity has sustained a loosened empirical cycle. As Michiels et al. (2022) note, most empirical work fails to provide an upfront, falsifiable definition of a filter bubble, resulting in varied operationalization. While measuring different aspects of the concept (e.g. content homogeneity or cross-cutting exposure) is not inherently problematic, researchers often insufficiently acknowledge which dimension they are measuring or its relation to the broader concept. More problematically, many studies do not specify falsifiable hypotheses about filter bubbles as positive filtering processes, instead operationalizing them negatively as the absence of diversity or other desirable properties, precluding direct tests of Pariser’s original theoretical mechanisms. This variation produces markedly different empirical patterns, which have been interpreted as mixed evidence for and against filter bubbles, perpetuating the definitional problem.

Limitation 2: Temporal Granularity—Insufficient temporal granularity in media effects research is closely related to the definitional problem. If filter bubbles are conceptualized as positive filtering processes involving decreasing diversity over time, measurement must be sufficiently sensitive to these temporally situated dynamics (Dahlgren, 2021; Michiels et al., 2022; Terren and Borge-Bravo, 2021).

Studies defining filter bubbles negatively as “states of absence” face the inherent challenge of inferring dynamic processes from static observations—detecting process from outcome rather than observing filtering mechanisms directly. Meanwhile, most longitudinal studies rely on aggregated or widely spaced observations that may miss the granular dynamics through which filtering effects unfold. As Michiels et al. (2023) note, there is a dearth of large $N$ repeated measures time-series data enabling robust testing of the filter bubble hypothesis. Ameliorating this demands a shift from “Large $N$ and Small $T$ ” designs—sampling many users at few time points—toward “Small $N$ and Large $T$ ” approaches prioritizing continuous, granular measurement (Trilling, 2024). The closest precedent in news trace research is Jürgens and Stark (2022), who applied a within-between decomposition to 4 months of tracking data, finding that short-term platform use (within-effects) uniformly increases exposure diversity, while long-term heavy reliance (between-effects) produces divergent and sometimes negative effects. In addition, Michiels et al. (2023) conducted a user-level longitudinal study of recommender systems on European news websites, employing negative binomial mixed-effects models that detected small topic variety decreases during initial weeks of user-system interaction.

Limitation 3: Diversity Measurement—A third limitation concerns the granularity with which diversity is measured. Content diversity is usually approximated through count (e.g. Michiels et al., 2023) or balance (e.g. Fletcher et al., 2023) measurements, assuming that (balanced) exposure to different categories of normative interest (e.g. outlet, topic, political leaning) reliably indicates diverse consumption. This approach cannot detect filtering processes where curation gradually narrows content toward semantically similar material within categorical boundaries. A user might appear to maintain diverse consumption by reading multiple outlets or topics while actually encountering increasingly homogeneous content reinforcing similar themes, perspectives, or frames within those categories. This measurement limitation is remarkably persistent in digital media research, as noted by Loecherbach et al. (2020), and might explain contradictory findings in filter bubble research, as filtering effects could operate below the categorical level these approaches operationalize.

A framework for testing the filter bubble hypothesis

To address these limitations, we adapt Michiels et al.’s (2022) definition-first framework for measuring filter bubbles. Taking our definition of news filter bubbles as decreases in diversity over time, their framework demands specification of study period, observation window, diversity measure, and statistical model. Our approach directly addresses the limitations outlined above: operationalizing an empirically falsifiable definition (limitation 1), ensuring temporal sensitivity through granular longitudinal measurement (limitation 2), and applying a diversity measure that accounts for categorical variety and balance alongside semantic diversity within and between categories (limitation 3).

We examine news consumption over a 1-year study period to capture longer-term filter bubble effects. We analyze consumption using both weekly and monthly observation windows—weekly binning detects short-term fluctuations in pathway usage and diversity, while monthly aggregation retains participants who read news less frequently, avoiding sample bias toward heavy consumers.

For diversity metrics, we measure both outlet-semantic and topic-semantic diversity using Stirling’s (2007) framework. This integrates traditional category-level diversity (how many different outlets or topics users encounter and how evenly their consumption is distributed) with semantic-level diversity (how conceptually distinct the content is), offering a more comprehensive assessment of information exposure than previous approaches.

Our statistical analysis adapts Michiels et al.’s (2023) mixed-effects modeling approach to accommodate our continuous diversity measures. The models address two core questions: (1) Do different access pathways affect baseline diversity levels? and (2) Do different pathways moderate diversity trajectories over time? The key parameters are the interactions between time and within-between decomposed pathway usage proportions (direct access, search engines, social media, email, and messaging services), which reveal whether specific pathways accelerate or decelerate diversity change over time. The random effects structure accounts for user-specific baselines and trajectories as well as temporal fluctuations due to news cycles. We include log-transformed volume controls to account for diminishing returns in the relationship between reading frequency and diversity.

Data

Our data pipeline consists of four stages: (1) collection of donated browsing histories, (2) identification of news articles, (3) classification of access pathways, and (4) content representation through embeddings and topic modeling.

Data collection

We obtained web browsing histories from $n = 179$ participants who donated their data through Google Takeout under GDPR data portability provisions. Data donation—in which participants exercise portability rights to share digital trace data with researchers—has emerged as a promising method for studying online behavior, offering access to ecologically valid behavioral records produced by participants as they use digital services in their day-to-day life (Pfiffner and Friemel, 2023; Welbers et al., 2024).

Data collection occurred during Lowlands Science, a science-focused program at Lowlands, one of the Netherlands’ largest annual music and culture festivals. Participants were first engaged with the topic of filter bubbles in digital media, before being offered the opportunity to donate their data to research. Participants used specialized software (Welbers et al., 2024) to download their complete Google browsing histories (including data from multiple devices, provided they were signed into their account at the time of browsing), could review data summaries before donating, and provided informed consent. This yielded chronological records of website visits for up to 1 year retrospectively (≈ August 2021 to August 2022), with an average of 11.2 months of data per participant and browsing activity recorded on an average of 241 days (79.2% of their total date range).

This researcher-mediated, in-person approach involves trade-offs relative to alternative data donation methods. The science-engaged festival context skews the sample toward younger, more female, more educated, and likely more digitally literate participants. However, online and panel-based recruitment carries well-documented limitations, including professional survey-takers, incentive-driven participation, and data quality concerns (Clemm von Hohenberg et al., 2024), which such in-person collection mitigates.

News identification

We identified news website visits through domain-level matching against a manually coded classification dataset (Loecherbach et al., 2024), defining news traces as visits to institutional news source domains. The domain mapping achieved 89% classification success. News visits comprised only 1.2% of all browsing activity, consistent with existing literature. We distinguished between article visits and structural website sections by decomposing URLs, identifying 50,932 article visits (39,272 unique articles) and 40,441 visits to other sections.

Pathway classification

We classified access pathways using a multi-label approach integrating URLs, metadata, and temporal proximity to categorize each visit as direct navigation, search engine referral, social media referral, email referral, or messaging service referral.

Classification followed a hierarchical procedure. We first identified unambiguous pathways through explicit signals including typed navigation, bookmark use, or URL parameters. For remaining visits, we applied temporal proximity rules allowing multiple pathway labels per trace. Visits were classified as search-mediated if preceded by a non-navigational search query within 1 minute, as direct if the same domain was accessed within 5 minutes prior, or as social media-, email-, or messaging-mediated if the respective platform was visited within 5 minutes prior. For unclassified cases, we extended the temporal window to 30 minutes when this yielded a single unambiguous pathway. This approach classified 80% of news article traces. Among classified traces, 82% received single pathway labels while 18% carried multiple labels.

Content representation

We scraped article titles and body text using NewsPaper4k, obtaining content for 97% of articles. We generated three content representations. First, embeddings using nomic-ai/nomic-embed-text-v2-moe, a pre-trained multilingual encoder model (Nussbaum et al., 2025), capturing semantic meaning for computational measurement of content similarity. Second, outlet labels derived directly from source domains. Third, unsupervised topic labels via BERTopic (Grootendorst, 2022), with manual coding consolidating 93 initial topics into 11 categories: (1) conflicts and refugees; (2) politics and elections; (3) economics, business, and employment; (4) transport, infrastructure; and environment; (5) crime and emergency; (6) health and pandemic; (7) entertainment and media; (8) lifestyle and culture; (9) housing and real estate; (10) sports and olympics; and (11) local and community. For details on the topic modeling procedure, see Appendix 2.

Sample characteristics

The final dataset comprised $n = 179$ users providing $n = 35, 654$ news article traces with complete pathway classification, content embeddings, and topic classifications. Participants ranged from 18 to 50 years ( $M = 28.1$ , $M d n = 26.0$ , $S D = 6.8$ ), with 59.8% female and 70.9% holding bachelor’s or master’s degrees—skewing younger, more female, and more educated than Dutch population averages. However, participants donated authentic, retrospectively collected browsing histories accumulated without prior knowledge of the analysis, eliminating reactivity effects.

News article traces were highly skewed across users ( $M = 275.3$ , $M d n = 72.0$ , $S D = 517.1$ , range = 1 − 3,871), with 40.5% contributing fewer than 50 traces, while 15.1% contributed over 500. This distribution reflects natural variation in news engagement and motivates our weekly and monthly binning approaches. For trace distributions, see Appendix 1.

Measures

Content diversity

Content diversity is a multi-faceted concept whose measurement emphasis depends on the normative model of democracy pursued (Loecherbach et al., 2020). Following Fletcher et al. (2023), we adopt a deliberative perspective prioritizing breadth of exposure to information, irrespective of congeniality. This suits the Netherlands, where strong public broadcasting has institutionalized diverse perspectives across outlets rather than concentrating them in partisan camps (Commissariaat voor de Media 2024). We measure outlet-semantic and topic-semantic diversity using an approach that integrates category-level and semantic-level diversity.

Stirling’s diversity framework

We adopt Stirling’s (2007) diversity framework to perform this integration, operationalizing three complementary aspects: variety (how many different categories), balance (distribution evenness across categories), and disparity (how different the categories are from each other).

Stirling’s diversity is formally expressed as

D = \sum_{i \neq j} p_{i} \times p_{j} \times d_{i j}

(1)

where $p_{i}$ and $p_{j}$ are the proportions of categories $i$ and $j$ in the user’s consumption, and $d_{i j}$ is the disparity (semantic distance) between them. The summation excludes diagonal terms ( $i \neq j$ ) since the distance from a category to itself is zero.

Applied to our data, variety represents the number of different categories (outlets or topics) in a user’s news repertoire, balance measures attention distribution across categories using Shannon’s entropy, and disparity quantifies how semantically different consumed categories are based on textual content, measured through article embeddings from the pre-trained language model. For disparity, we use Earth Mover’s Distance (EMD) rather than cosine similarity. While cosine similarity measures angular distance between average semantic positions, EMD measures the minimum effort required to transform one category’s entire embedding distribution into another’s, accounting for both distance and variance within each category. This distributional sensitivity provides more accurate semantic disparity measurement than centroid-based approaches. See Appendix 3 for further detail.

Outlet-Semantic Diversity measures variety, balance, and semantic disparity across news sources. Traditional outlet diversity treats all source differences equally, but outlet-semantic diversity weighs differences by actual content similarity. High outlet-semantic diversity indicates consumption from many sources (variety), relatively even attention across them (balance), and sources providing semantically distinct rather than redundant coverage (disparity).

Topic-Semantic Diversity measures variety, balance, and semantic disparity across subject matter. Traditional topic diversity assumes any two topic categories contribute equivalent diversity regardless of content similarity. Topic-semantic diversity instead examines semantic overlap within and between categories. High topic-semantic diversity indicates engagement with many topics (variety), even attention distribution (balance), and topics covering semantically distinct rather than overlapping journalistic terrain (disparity).

Modeling

We construct linear mixed-effects models to assess how different news access pathways relate to changes in content diversity over time. Our dependent variables are outlet-semantic and topic-semantic diversity. Key independent variables are the decomposed proportions of articles accessed via search engines, social media, email, and messaging services, with direct access as the reference category. For articles with multiple pathway classifications, we distribute each article’s weight equally across identified pathways, then calculate pathway proportions by normalizing across total articles per user-time observation. Additional variables include time since first appearance and pathway–time interactions that capture how each pathway influences diversity change relative to direct access.

We estimate parallel models for weekly and monthly observation windows. Weekly aggregation provides finer temporal granularity; monthly aggregation includes lower-frequency news consumers. We exclude user-time periods with fewer than two topics or outlets, as Stirling’s diversity requires multiple categories.

Within-between decomposition

Because pathway usage varies both between users and within users over time, raw pathway proportions conflate two distinct sources of variation: stable individual differences in pathway preferences and period-to-period shifts in a given user’s pathway behavior. Users who habitually rely on distributed pathways may differ systematically from those who favor direct access in ways that independently affect diversity. Without separating these sources, pathway coefficients may conflate the effect of the pathway itself with the characteristics of the people who use it (Haim et al., 2021).

We address this using a within-between decomposition, also known as the Mundlak or hybrid model (Mundlak, 1978). For each pathway $p$ and user $i$ , we decompose the time-varying pathway proportion ${Pathway}_{p, i t}$ into two components

\begin{matrix} {\bar{Pathway}}_{p, i} = \frac{1}{T_{i}} \sum_{t = 1}^{T_{i}} {Pathway}_{p, i t} & # Person - mean (between) \end{matrix}

(2)

\begin{matrix} {\tilde{Pathway}}_{p, i t} = {Pathway}_{p, i t} - {\bar{Pathway}}_{p, i} & # Deviation from person - mean (within) \end{matrix}

(3)

The between-person component ( ${\bar{Pathway}}_{p, i}$ ) captures stable individual differences: whether users who generally rely more on a given pathway show different diversity levels. These effects are potentially confounded by unobserved individual characteristics. The within-person component ( ${\tilde{Pathway}}_{p, i t}$ ) captures period-specific deviations: whether the same user shows different diversity when they use a pathway more than their own average in a given period. Because within-person variation is purged of stable individual characteristics, these estimates provide a stronger basis for attributing diversity effects to the pathways themselves.

Model specifications

We estimate four parallel models examining each diversity dimension and time binning:

M1a: Outlet Diversity, Weekly—Outlet-semantic diversity change across pathways, weekly binning

M1b: Outlet Diversity, Monthly—Outlet-semantic diversity change across pathways, monthly binning

M2a: Topic Diversity, Weekly—Topic-semantic diversity change across pathways, weekly binning

M2b: Topic Diversity, Monthly—Topic-semantic diversity change across pathways, monthly binning

Each model takes the form

Y_{i t} \sim Normal (μ_{i t}, σ^{2})

\begin{matrix} μ_{i t} = β_{0} \\ + β_{T} T_{i t} & # Base time effect \\ + \sum_{p} β_{{Pathway}_{p}}^{W} {\tilde{Pathway}}_{p, i t} & # Within - person pathway effects (RQ 1) \\ + \sum_{p} β_{{Pathway}_{p}}^{B} {\bar{Pathway}}_{p, i} & # Between - person pathway effects (RQ 1) \\ + \sum_{p} β_{T \times {Pathway}_{p}}^{W} (T_{i t} \times {\tilde{Pathway}}_{p, i t}) & # Time \times Within interactions (RQ 2) \\ + \sum_{p} β_{T \times {Pathway}_{p}}^{B} (T_{i t} \times {\bar{Pathway}}_{p, i}) & # Time \times Between interactions (RQ 2) \\ + β_{X} \ln (X_{i t}) & # Volume control \\ + β_{Age} {Age}_{i} + β_{Gen} {Gen}_{i} + β_{Edu} {Edu}_{i} & # Demographic controls \\ + b_{0 i} + b_{T i} T_{i t} + g_{t} & # Random effects \end{matrix}

where $Y_{i t}$ is outlet/topic-semantic diversity for user $i$ in time $t$ , $T_{i t}$ is time since user $i$ first appeared, ${\tilde{Pathway}}_{p, i t}$ is the within-person deviation in pathway proportion for pathway $p$ , ${\bar{Pathway}}_{p, i}$ is the person-mean pathway proportion, and $X_{i t}$ is total articles consumed.

Key parameters

The within-between decomposition produces four types of pathway coefficient, addressing our two research questions:

Within-person pathway effects ( $β_{{Pathway}_{p}}^{W}$ ): In periods when a user accesses a greater proportion of news via pathway $p$ than their own average, is their diversity higher or lower? Because these estimates are purged of stable individual characteristics, positive values provide a stronger basis for attributing diversity differences to the pathway itself.

Between-person pathway effects ( $β_{{Pathway}_{p}}^{B}$ ): Do users who habitually access a greater proportion of news via pathway $p$ show higher or lower diversity overall? These effects are confounded by individual selection—users who prefer certain pathways may differ in ways that independently affect diversity.

Within-person moderation effects ( $β_{T \times {Pathway}_{p}}^{W}$ ): In periods when a user accesses a greater proportion of news via pathway $p$ than their own average, does their diversity trajectory change? Negative values indicate that within-person increases in pathway usage accelerate diversity convergence.

Between-person moderation effects ( $β_{T \times {Pathway}_{p}}^{B}$ ): Do users who habitually access a greater proportion of news via pathway $p$ show different diversity trajectories over time? As with the between-person main effects, these are confounded by individual selection, as users who favor certain pathways may differ in ways that independently shape their diversity trajectories.

Together, effects 1 and 2 address RQ1, while effects 3 and 4 address RQ2. We estimate separate models for main effects only (RQ1), and with the addition of interaction terms (RQ2). We control for reading volume ( $β_{X}$ , positive but logarithmic to reflect diminishing effects on diversity) and demographics (age, gender, education).

Random effects

Our random effects structure addresses the multi-level nature of longitudinal user data

{(b_{0 i}, b_{T i})}^{'} \sim N (0, Σ_{b}) a n d g_{t} \sim N (0, σ_{g})

User random intercepts ( $b_{0 i}$ ) account for individual differences in diversity preferences, while user random slopes ( $b_{T i}$ ) capture heterogeneity in how individual users’ diversity patterns evolve. These random effects absorb individual-level variation from unmeasured sources beyond pathway preferences, complementing the between-person pathway components which capture only the portion of individual heterogeneity attributable to stable differences in pathway usage. Time random intercepts ( $g_{t}$ ) control for period-specific factors such as major news events affecting all users.

Implementation

We conducted analyses in Python 3.12 using statsmodels (v0.14.6) for mixed-effects regression with crossed random effects. We employed a hierarchical fallback approach: (1) crossed random effects (user intercepts, user slopes, time intercepts), (2) user-only random effects if crossed models failed convergence, and (3) intercept-only specification if needed. We used lbfgs optimization with 1e-6 convergence tolerance and validated that random slope variance estimates were not boundary values to ensure statistical validity.

Results

Model fit and validation

We fitted eight mixed-effects models using crossed random effects (user intercepts, user slopes, time intercepts). The final models analyzed 3,190 observations across 53 weeks (weekly) and 1313 observations across 13 months (monthly), representing 170–171 of the 179 users after excluding user-time periods with insufficient category volume. Model fit statistics are reported in Table 1.

Table 1.

Model fit statistics.

Model	Outcome	Time period	Model type	N _Obs	N _users	Log-likelihood	AIC	BIC
M1a	Outlet Diversity	Weekly	Main Effects	3,190	170	310.74	−585.48	−476.26
M1a	Outlet Diversity	Weekly	+Interactions	3,190	170	275.48	−498.96	−341.20
M1b	Outlet Diversity	Monthly	Main Effects	1,313	171	218.19	−400.37	−307.13
M1b	Outlet Diversity	Monthly	+ Interactions	1,313	171	195.00	−338.01	−203.33
M2a	Topic Diversity	Weekly	Main Effects	3,190	170	241.61	−447.22	−338.00
M2a	Topic Diversity	Weekly	+Interactions	3,190	170	206.55	−361.10	−203.34
M2b	Topic Diversity	Monthly	Main Effects	1,313	171	208.20	−380.40	−287.16
M2b	Topic Diversity	Monthly	+Interactions	1,313	171	183.80	−315.61	−180.92

Fixed effects explained 16–32% of variance (marginal R²) and 46–62% including random effects (conditional R²; see Appendix 4 for full model detail). This gap confirms that individual differences account for more variance than measured predictors, validating our mixed-effects approach.

User random slopes showed substantial variance ( $σ = 0.132 - 0.158$ ), indicating meaningful individual differences in diversity trajectories. User random intercepts were small ( $σ = 0.000 - 0.021$ ), and time period effects modest ( $σ = 0.002 - 0.016$ ). Residual variance remained the largest component ( $σ = 0.168 - 0.208$ ).

Control variables showed consistent patterns. Reading volume had the strongest relationships ( $β = 0.061 - 0.098$ , $p < 0.001$ ), with positive coefficients reflecting diminishing returns. Age showed small positive effects ( $β = 0.003 - 0.006$ , $p < 0.05$ in most models). Women showed significantly lower topic-semantic diversity than men ( $β = - 0.041$ to $- 0.077$ , $p < 0.05$ in weekly models) but no consistent gender difference for outlet diversity was found. Education showed no significant associations.

Shapiro–Wilk tests indicated departures from normality ( $p < 0.001$ ), primarily reflecting moderate negative skewness, but these are manageable given large sample sizes as mixed-effects models are robust to non-normal residuals. Consistency across weekly and monthly aggregations provides additional robustness evidence.

Pathway effects on diversity (RQ1)

RQ1 examined whether different access pathways contribute different levels of diversity to users’ news diets. The within-between decomposition separates within-person effects—whether a user’s diversity changes when they shift toward a pathway relative to their own average—from between-person effects capturing stable individual differences that may reflect selection rather than pathway influence. We focus primarily on within-person effects, which provide the stronger basis for attributing differences to pathways themselves.

All diversity measures are scaled 0 to 1, with coefficients representing changes relative to direct access. Table 2 reports effects scaled per 1 standard deviation (SD) shift in pathway proportion, representing typical variation in our data and making effects comparable across pathways. Full model coefficients and fit statistics are reported in Appendix 4.

Table 2.

Pathway effects on content diversity (per 1 SD shift).

		Outlet-semantic diversity		Topic-semantic diversity
		RQ1	RQ2	RQ1	RQ2
		Pathway Effect	Cumulative annual ∆	Pathway effect	Cumulative annual ∆
Panel A: Weekly Models (M1a / M2a)
Search	Within	+0.061***	−0.025^†	+0.017***	−0.022
	Between	+0.018	+0.008	−0.058***	+0.044^†
Social Media	Within	+0.035***	−0.035**	+0.014***	+0.004
	Between	+0.057***	+0.038^†	+0.019	+0.037^†
Messaging	Within	+0.014***	−0.010	+0.011**	+0.010
	Between	+0.006	+0.025	−0.014	+0.016
Email	Within	+0.009*	−0.004	+0.010**	−0.009
	Between	+0.009	−0.014	+0.005	+0.005
Panel B: Monthly Models (M1b / M2b)
Search	Within	+0.069***	−0.040*	+0.032***	−0.005
	Between	+0.043**	−0.001	−0.018	−0.003
Social Media	Within	+0.040***	−0.039*	+0.022***	−0.008
	Between	+0.068***	+0.062*	+0.037**	+0.027
Messaging	Within	+0.007	−0.004	+0.004	−0.019
	Between	+0.003	−0.002	−0.007	−0.004
Email	Within	+0.004	−0.003	+0.010^†	+0.015
	Between	+0.001	−0.023	−0.005	+0.014

Note. Effects are expressed relative to direct access (reference pathway), scaled per 1 SD shift in pathway proportion. RQ1 columns show the estimated change in diversity per 1 SD increase in pathway proportion, from Main Effects models (β × SD). RQ2 columns show the estimated cumulative change in diversity over 1 year from a sustained 1 SD above-average pathway proportion, from Interaction models (weekly: β × SD × 52; monthly: β × SD × 12). Negative RQ2 values indicate that the pathway’s diversity contribution erodes over time. Within-person effects capture diversity changes when the same user shifts toward a pathway relative to their own average; because these are purged of stable individual differences, they provide a stronger basis for attributing effects to the pathway itself. Between-person effects capture stable differences between users who habitually differ in pathway reliance; these may reflect unobserved individual characteristics rather than pathway effects per se. Full model coefficients are reported in Appendix 4.

†

p < 0.10. *p < 0.05. **p < 0.01. ***p < 0.001.

Outlet-semantic diversity

Within-person effects indicate that all distributed pathways are associated with higher outlet-semantic diversity than direct access. Search shows the largest effects: a 1 SD within-person shift toward search is associated with an increase of 0.061 (weekly) to 0.069 (monthly). Social media shows the next-largest effects (+0.035 to +0.040). Messaging and email show smaller positive associations (+0.014 and +0.009) reaching significance in weekly models but attenuating at monthly aggregation.

Between-person effects complement this picture. Social media shows significant positive associations at both levels (+0.057 to +0.068, $p < 0.001$ ), reinforcing the within-person pattern. Search shows a weaker between-person association, non-significant weekly (+0.018) but significant at monthly aggregation (+0.043, $p < 0.05$ ). Messaging and email show no significant between-person effects.

Topic-semantic diversity

Within-person effects follow the same directional pattern at roughly half the magnitude. Search shows the largest effects (+0.017 to +0.032), followed by social media (+0.014 to +0.022). Messaging and email show small positive weekly associations (+0.010 to +0.011, $p < 0.01$ ) that do not replicate monthly (email is marginal). Across both dimensions, the within-person evidence indicates that distributed pathways, particularly search and social media, are associated with greater diversity than direct access.

The between-person effects reveal a notable sign reversal for search. While within-person shifts toward search are associated with higher topic diversity, habitual search users show significantly lower topic-semantic diversity in the weekly model (−0.058, $p < 0.001$ )—more than three times the magnitude of the corresponding within-person benefit (+0.017). This effect attenuates at monthly aggregation (−0.018, $n . s .$ ). Social media, by contrast, shows consistent positive associations at both levels, with between-person effects reaching significance monthly (+0.037, $p < 0.01$ ). Messaging and email show no significant between-person associations.

RQ2: Pathway-moderated diversity trajectories

RQ2 asks whether pathways moderate diversity trajectories over time—whether we observe filter bubble dynamics. The interaction models add pathway × time terms to test whether pathway-dependent diversity gains or losses accumulate over consumption histories. Results are in the RQ2 columns of Table 2; full model coefficients and fit statistics are reported in Appendix 4.

The base time effect was non-significant across all main effects models and most interaction models (one exception: weekly topic diversity, −0.002, $p < 0.05$ ). This null pattern indicates no universal diversity trend; any trajectory effects are pathway-specific.

Outlet-semantic diversity

Within-person increases in social media and search usage were associated with erosion of their diversity contributions over time. For social media, a sustained 1 SD above-average shift was associated with cumulative annual reductions of −0.035 ( $p < 0.01$ , weekly) and −0.039 ( $p < 0.05$ , monthly). Search showed a similar pattern at monthly aggregation (−0.040, $p < 0.05$ ), with a marginal weekly estimate (−0.025, $p < 0.10$ ). No other pathway showed significant within-person trajectory effects.

However, between-person estimates for social media point in the opposite direction. At monthly aggregation, habitual social media users showed increasing outlet diversity over time (+0.062, $p < 0.05$ ), with a marginal weekly estimate (+0.038, $p < 0.10$ ). This mirrors the search sign reversal on topic diversity in RQ1: within-person evidence is consistent with a filtering process, but between-person trajectories suggest that the characteristics of habitual users counteract this erosion. Search, by contrast, lacks this counterweight: despite comparable within-person erosion −0.025 to −0.040), its between-person trajectory estimates are essentially zero (+0.008 and −0.001, both $n . s .$ ), suggesting habitual search users are not similarly insulated. No other pathway showed significant between-person trajectory effects.

Topic-semantic diversity

No pathway × time interaction reached conventional significance for topic-semantic diversity. Marginal between-person effects for search (+0.044, $p < 0.10$ ) and social media (+0.037, $p < 0.10$ ) appeared in weekly models only, without replicating monthly.

Summary

The limited within-person cumulative annual erosion described above is comparable in magnitude to the corresponding RQ1 level effects. Social media’s within-person outlet diversity boost of +0.035 per 1 SD shift (weekly) would be approximately offset after 1 year by the trajectory effect of −0.035. However, the between-person evidence suggests habitual social media users may be insulated from net diversity loss.

Trajectory effects were confined to outlet-semantic diversity. Since this measure integrates source count, balance, and semantic disparity, the erosion reflects narrowing of substantive differences between consumed outlets’ coverage. That this does not extend to topic diversity suggests filtering operates on source repertoires while topical breadth remains fairly stable. Users may gravitate toward outlets providing increasingly overlapping content without narrowing the subjects they encounter.

Overall, however, interaction models consistently showed worse fit than main effects models by both Akaike information criterion (AIC) and Bayesian information criterion (BIC) (Table 1), indicating that pathway × time parameters do not improve model performance despite individually significant coefficients. The evidence for pathway-moderated filter bubble dynamics is thus limited primarily to within-person erosion of outlet-semantic diversity for social media and search, tempered by opposing between-person trends and poorer model fit.

Discussion

RQ1 asked whether different access pathways contribute different levels of content diversity. The answer is yes: all distributed pathways are associated with higher diversity than direct access when the same user shifts toward them, with search and social media showing the largest and most robust effects. RQ2 asked whether pathways moderate diversity trajectories over time. The answer is partially yes, but only for outlet diversity: within-person increases in search and social media use are associated with gradual erosion of their diversity contributions, while topic diversity trajectories show no pathway moderation.

Distributed pathways diversify news consumption

The within-person results consistently indicate that shifting toward any distributed pathway is associated with higher content diversity, with search showing the largest effects followed by social media, and messaging and email contributing smaller and less stable associations. Because within-person estimates are purged of stable individual characteristics, this provides a relatively strong basis for attributing diversity differences to the pathways themselves.

These findings align with and extend prior work. The search result supports the “automated serendipity” account (Fletcher and Nielsen, 2018) and evidence that search exposes users to unfamiliar sources (Ulloa and Kacperski, 2024; Wojcieszak et al., 2022). The social media result reinforces findings that social platforms diversify exposure through weak ties and incidental encounters (Flaxman et al., 2016; Fletcher and Nielsen, 2018; Lorenz-Spreen et al., 2023; Wojcieszak et al., 2022). Notably, direct access—the pathway most dependent on self-selection through established routines (Möller et al., 2020)—is associated with the narrowest consumption. While prior work suggests direct access maintains topical breadth through editorial curation (Dubois and Blank, 2018; Hüllmann and Sensmeier, 2022), our semantic measures suggest this breadth does not extend to the content level when compared against distributed alternatives, possibly reflecting the difference between categorical diversity (where editorial curation provides a range of topic labels) and semantic diversity (where self-selected outlets may cover more homogeneous journalistic terrain).

The within-between decomposition reveals selection dynamics

Two sign reversals highlight the importance of distinguishing pathway effects from selection effects. For topic diversity, within-person shifts toward search are associated with higher diversity, yet habitual search users show significantly lower topic diversity. For outlet diversity trajectories, within-person social media increases are associated with diversity erosion, yet habitual social media users show increasing outlet diversity over time. Both reversals carry the same logic: the characteristics of people who gravitate toward a pathway diverge from the pathway’s own effect.

The search reversal is consistent with evidence that while search exposes users to diverse content, engagement can reflect selective clicking toward preference-confirming material (Robertson et al., 2023). Habitual search users may rely on the pathway because it serves targeted information needs, gradually narrowing topic consumption even as individual sessions introduce variety. This parallels Jürgens and Stark (2022), who found that short-term platform use increases diversity, while long-term heavy reliance produces divergent effects. Both analyses reveal an analogous pattern where a pathway’s diversity association reverses across levels of analysis.

The social media trajectory reversal admits a similar interpretation. The within-person erosion suggests that social media’s diversity contribution diminishes with increased reliance, consistent with engagement-level filtering through algorithmic convergence and network clustering (Barberá, 2020; Bechmann and Nielbo, 2018; Kitchens et al., 2020). Yet the positive between-person trajectory suggests habitual social media news consumers broaden their source base over time, likely reflecting their profile—more digitally engaged and multi-platform (Ross Arguedas et al., 2022). Search, by contrast, shows comparable within-person erosion but no countervailing between-person trajectory, making it the distributed pathway whose long-term diversity profile most closely resembles a filtering process—albeit a modest one confined to outlet diversity.

Outlet diversity shows pathway-moderated convergence; topic diversity does not

The within-person erosion effects for search and social media suggest their diversity contributions diminish with sustained above-average use—consistent with filter bubble dynamics. That the cumulative annual erosion roughly offsets the corresponding level effects illustrates the practical significance: a user sustaining elevated social media use for a year may see the pathway’s diversity advantage largely neutralized.

This interpretation warrants caution. First, interaction models consistently showed worse fit by AIC and BIC. Second, erosion is concentrated in outlet diversity, with no significant pathway moderation for topic diversity. Third, the base time trend was non-significant, indicating no general drift toward narrower consumption. Together, these patterns suggest filtering dynamics are modest, dimension-specific, and pathway-dependent rather than universal.

The outlet-topic divergence carries substantive implications. The erosion reflects narrowing not merely in source count but in semantic distinctiveness of consumed outlets’ coverage. That this does not extend to topic diversity suggests filtering operates on source repertoires and coverage character rather than subject range. Users may gravitate toward outlets with increasingly overlapping content—similar framings, perspectives, or editorial foci—without narrowing the topics they read about. If sustained, this could mean citizens maintain awareness of diverse issues while gradually losing access to diverse perspectives on those issues, a form of narrowing invisible to categorical diversity measures.

Advancing diversity measures

Traditional measures based on counts or balance of exposure across categories risk misestimating content diversity by treating within-category exposures as equivalent, potentially registering “shallow” diversity where pathways expose users to categorically distinct but semantically similar content. Our approach addresses this by integrating content embeddings into Stirling’s framework, combining variety, balance, and semantic disparity.

The persistence of substantial within-person pathway effects under these measurements demonstrates that distributed pathways provide deep rather than shallow diversification. They do not merely shuffle users across categorical boundaries but expose them to semantically diverse content within and across them, enriching repertoires with varied journalistic substance. This strengthens existing evidence by showing that pathway effects survive a more stringent test than categorical measures provide. The magnitude difference between dimensions is itself informative: within-person effects for topic diversity were roughly one-third to one-half those for outlet diversity, suggesting pathways diversify sources more readily than subject matter, consistent with introducing users to unfamiliar outlets covering familiar topics.

This approach introduces dependencies for future adopters. Diversity scores are sensitive to embedding model choice and depend on access to news article content. These considerations highlight the additional complexity of content-level diversity measurement without undermining its core advantage over categorical alternatives.

Limitations and future directions

Our convenience sample skews younger, more female, and more educated than Dutch population averages, limiting generalizability. The approximately 1-year temporal scope—longer than most trace-data studies, to our knowledge—cannot capture effects emerging over multiple years; filtering processes may accumulate or reverse over longer horizons. Reliance on Google account–linked browsing histories restricts coverage to a subset of information diets; future work should pursue multi-platform data donations and complement trace data with surveys or interviews to illuminate motivations shaping pathway use.

By defining news traces as visits to institutional news source domains, our analysis excludes non-institutional sources and other digital channels where filter bubble dynamics may also operate. Our null topic-diversity findings should be interpreted accordingly—they apply to the institutional news segment we observe. That said, our 258 unique news domains span international, national, and local coverage, and we do detect significant outlet-diversity convergence under the same scope restriction, suggesting the null topic result is not straightforwardly attributable to limited source variation.

Donated browsing data carry systematic limitations (Clemm von Hohenberg et al., 2024), but our in-person, researcher-mediated collection reduced reactivity effects and avoided data quality issues of online panels. Web browsing data also address the unreliability of self-reported media consumption, and sustaining our analysis’ temporal granularity and semantic depth required actual browsing histories. As Welbers et al. (2024) demonstrate, researcher-assisted field settings, such as the one that we pursued, improve both participation rates and data quality.

While the within-between decomposition accounts for stable individual differences and the random effects capture individual trajectories and extrinsic temporal factors, idiosyncratic time-varying factors could co-occur with pathway shifts. The within-person estimates should be interpreted as closer to causal effects than aggregate associations, but not as definitive causal identification.

The pathway affordances documented here are themselves evolving. Search engines increasingly deliver AI-synthesized answers rather than ranked source links, potentially diminishing the diversity-enhancing mechanisms we document. Social media platforms are shifting toward fully algorithmic mediation decoupled from social network structure. These developments may fundamentally alter pathway filtering dynamics, making longitudinal monitoring particularly important.

Conclusion

Our findings support the filter bubble concept’s conditional utility for digital media research. When operationalized as measurable temporal changes in diversity with sufficient granularity, the concept provides a straightforward test for convergence toward narrower consumption. We detected within-person pathway-moderated convergence for outlet diversity that would be invisible without temporal sensitivity, while establishing null pathway moderation for topic diversity—illustrating the concept’s capacity both to detect filtering dynamics and to provide null evidence.

Our results also reveal that aggregate pathway–diversity associations can obscure divergent processes. Search is associated with higher topic diversity at the episode level but lower diversity among habitual users; social media’s outlet diversity contribution erodes within-person but grows among habitual users over time. These reversals underscore the value of decomposing aggregate associations in digital media research.

At the same time, our focus on the filter bubble concept represents a fundamental conceptual limitation. As Slater (2015) and Trilling (2024) argue, media effects involve non-linear dynamics where consumption choices create feedback loops reshaping preferences and information environments. Our linear approach cannot capture these dynamics; the substantial residual variance ( $σ = 0.168 - 0.208$ ) suggests that consumption patterns beyond our framework’s reach remain unexplained, potentially including non-linear trajectories. Future research should develop approaches for “rabbit holes” and other non-monotonic phenomena that threaten information diversity but exceed filter bubble theory’s definitional boundaries. Nevertheless, such concepts introduce considerable measurement challenges and they should complement rather than replace a well-operationalized filter bubble test.

Footnotes

Appendix 1

Appendix 2

Appendix 3

Appendix 4 ORCID iDs

Rupert Kiddle

Damian Trilling

Ethical considerations

This project was ethically approved by Vrije Universiteit Amsterdam.

Consent to participate

Participant consent was collected during the data donation process.

Author contributions

Material preparation, collection of addtional data (web scraping), and analysis were performed by Rupert Kiddle. The first draft of the manuscript was written by Rupert Kiddle and all authors (Anne Kroon, Kasper Welbers, Damian Trilling) commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement no. 947695).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The data that support the findings of this study cannot be made publicly available as it contains sensitive personal data (web browsing history) protected under GDPR. Collection of trace data through donation is enabled by Article 15 of the European Union’s 2018 GDPR, which allows individuals to access and transfer their information to third parties (Araujo et al., 2022).

Code availability

The code used to perform the analyses in this paper is available at .

Author biographies

Rupert Kiddle (MSc Research, University of Amsterdam, 2022) is a research associate at the Vrije Universitiet Amsterdam, where he is completing his PhD in Computaional Communication Science, and is the editorial assistant at Computational Communication Research. His research focuses on the evolving state of news consumption, the dynamics of digital architecture, and social simulation using generative frameworks.

Anne Kroon (PhD, University of Amsterdam, 2017) is an associate professor at the Amsterdam School of Communication Research (ASCoR), University of Amsterdam. Her research examines bias in digital media using computational techniques and experiments, with a focus on algorithmic bias in recruitment and the content and consequences of biased representations of minorities in media.

Kasper Welbers (PhD, Vrije Universiteit Amsterdam, 2016) is an associate professor at Vrije Universiteit Amsterdam, co-director of the Societal Analytics Lab, and vice chair of the Computational Methods division at the International Communication Association. His research focuses on news flows and gatekeeping, and the development and validation of computational content analysis and data donation methodology.

Damian Trilling (PhD, University of Amsterdam, 2013) is a professor at Vrije Universiteit Amsterdam, where he holds the chair for Journalism Studies. His research interests focus on the use and dissemination of news in the current media ecosystem, with a specific focus on innovative methodological approaches.

References

Araujo

Ausloos

van Atteveldt

, et al. (2022) OSD2F: An open-source data donation framework. Computational Communication Research 4(2): 372–387. https://doi.org/10.5117/CCR2022.2.001.ARAU

Barberá

(2020) Social media, echo chambers, and political polarization. In: Persily

Tucker

(eds) Social Media and Democracy. Cambridge University Press, pp. 34–55. https://doi.org/10.1017/9781108890960.004.

Beam

Hutchens

Hmielowski

(2018) Facebook news and (de)polarization: reinforcing spirals in the 2016 US election. Information, Communication & Society 21(7): 940–958. https://doi.org/10.1080/1369118X.2018.1444783

Bechmann

Nielbo

(2018) Are we exposed to the same “news” in the news feed? An empirical analysis of filter bubbles as information similarity for Danish Facebook users. Digital Journalism 6(8): 990–1002. https://doi.org/10.1080/21670811.2018.1510741

Bruns

(2019) Are Filter Bubbles Real? Polity Press.

Cardenal

Aguilar-Paredes

Galais

, et al. (2019) Digital technologies and selective exposure: How choice and filter bubbles shape news media exposure. The International Journal of Press/Politics 24(4): 465–486. https://doi.org/10.1177/1940161219862988

Cinelli

De Francisci Morales

Galeazzi

, et al. (2021) The echo chamber effect on social media. Proceedings of the National Academy of Sciences 118(9): e2023301118. https://doi.org/10.1073/pnas.2023301118

Clemm von Hohenberg

Stier

Cardenal

, et al. (2024) Analysis of web browsing data: A guide. Social Science Computer Review 42(6): 1479–1504. https://doi.org/10.1177/08944393241227868

Commissariaat voor de Media (2024) Media Monitor 2024. Technical report, Commissariaat voor de Media. Available at: https://www.cvdm.nl/mediamonitor/english/

10.

Dahlgren

(2021) A critical review of filter bubbles and a comparison with selective exposure. Nordicom Review 42(1): 15–33. https://doi.org/10.2478/nor-2021-0002

11.

Dubois

Blank

(2018) The echo chamber is overstated: the moderating effect of political interest and diverse media. Information, Communication & Society 21(5): 729–745. https://doi.org/10.1080/1369118X.2018.1428656

12.

Flaxman

Goel

Rao

(2016) Filter bubbles, echo chambers, and online news consumption. Public Opinion Quarterly 80(S1): 298–320. https://doi.org/10.1093/poq/nfw006

13.

Fletcher

Nielsen

(2018) Are people incidentally exposed to news on social media? A comparative analysis. New Media & Society 20(7): 2450–2468. https://doi.org/10.1177/1461444817724170

14.

Fletcher

Kalogeropoulos

Nielsen

(2023) More diverse, more politically varied: How social media, search engines and aggregators shape news repertoires in the United Kingdom. New Media & Society 25(8): 2118–2139. https://doi.org/10.1177/14614448211027393

15.

Grootendorst

(2022) BERTopic: Neural topic modeling with a class-based TF-IDF procedure. https://doi.org/10.48550/arXiv.2203.05794

16.

Habermas

(2010) The concept of human dignity and the realistic utopia of human rights. Metaphilosophy 41(4): 464–480. https://doi.org/10.1111/j.1467-9973.2010.01648.x

17.

Haim

Breuer

Stier

(2021) Do news actually “find me”? using digital behavioral data to study the news-finds-me phenomenon. Social Media + Society 7 (3). https://doi.org/10.1177/20563051211033820

18.

Hartmann

Wang

Pohlmann

, et al. (2025) A Systematic review of echo chamber research: Comparative analysis of conceptualizations, operationalizations, and varying outcomes. Journal of Computational Social Science 8(2). https://doi.org/10.1007/s42001-025-00381-z

19.

Hüllmann

Sensmeier

(2022) No filter bubbles? Evidence from an online experiment on the news diversity of personalizing news aggregators. In: Proceedings of the Pacific Asia Conference on Information Systems (PACIS 2022). Association for Information Systems, pp. 1–17. https://aisel.aisnet.org/pacis2022/111/

20.

Jürgens

Stark

(2022) Mapping exposure diversity: The divergent effects of algorithmic curation on news consumption. Journal of Communication 72(3): 322–344. https://doi.org/10.1093/joc/jqac009

21.

Kalogeropoulos

(2021) Who shares news on mobile messaging applications, why and in what ways? A cross-national analysis. Mobile Media & Communication 9(2): 336–352. https://doi.org/10.1177/2050157920958442

22.

Kitchens

Johnson

Grey

(2020) Understanding echo chambers and filter bubbles: The impact of social media on diversification and partisan shifts in news consumption. Management Information Systems Quarterly 44(4): 1619–1649. https://doi.org/10.25300/MISQ/2020/16371

23.

Maragh

Ekdale

, et al. (2019) Measuring political personalization of Google news search. In: The World wide web conference. ACM, pp. 2957–2963. https://doi.org/10.1145/3308558.3313682

24.

Loecherbach

Moeller

Trilling

, et al. (2020) The unified framework of media diversity: A systematic literature review. Digital Journalism 8(5): 605–642. https://doi.org/10.1080/21670811.2020.1764374

25.

Loecherbach

Moeller

Trilling

, et al. (2024) What is news? Mapping the diversity of news experiences in digital trace data. Journalism 27(2): 407–425. https://doi.org/10.1177/14648849241303115

26.

Lorenz-Spreen

Oswald

Lewandowsky

, et al. (2023) A systematic review of worldwide causal and correlational evidence on digital media and democracy. Nature Human Behaviour 7(1): 74–101. https://doi.org/10.1038/s41562-022-01460-1

27.

Michiels

Leysen

Smets

, et al. (2022) What are filter bubbles really? A review of the conceptual and empirical work. In: Adjunct Proceedings of the 30th ACM conference on user modeling, adaptation and personalization. ACM, pp. 274–279. https://doi.org/10.1145/3511047.3538028

28.

Michiels

Vannieuwenhuyze

Leysen

, et al. (2023) How should we measure filter bubbles? A Regression model and evidence for online news. In: Proceedings of the 17th ACM conference on recommender systems. ACM, pp. 640–651. https://doi.org/10.1145/3604915.3608805

29.

Möller

van de Velde

Merten

, et al. (2020) Explaining online news engagement based on browsing behavior: Creatures of habit? Social Science Computer Review 38(5): 616–632. https://doi.org/10.1177/0894439319828012

30.

Monti

D’Ignazi

Starnini

, et al. (2023) Evidence of Demographic rather than Ideological Segregation in News Discussion on Reddit. In: Proceedings of the ACM web conference 2023. ACM, pp. 2777–2786. https://doi.org/10.1145/3543507.3583468

31.

Mundlak

(1978) On the pooling of time series and cross section data. Econometrica 46(1): 69–85. https://doi.org/10.2307/1913646

32.

Newman

Fletcher

Robertson

, et al. (2024) Reuters Institute digital news report 2024. Technical report, Reuters Institute for the Study of Journalism. https://doi.org/10.60625/RISJ-VY6N-4V57

33.

Nikolov

Lalmas

Flammini

, et al. (2019) Quantifying biases in online information exposure. Journal of the Association for Information Science and Technology 70(3): 218–229. https://doi.org/10.1002/asi.24121

34.

Nussbaum

Morris

Duderstadt

, et al. (2025) Nomic embed: Training a reproducible long context text embedder. Transactions on Machine Learning Research. https://doi.org/10.48550/arXiv.2402.01613

35.

Pariser

(2011) The Filter Bubble: What the Internet Is Hiding from You. Penguin Press.

36.

Pfiffner

Friemel

(2023) Leveraging data donations for communication research: Exploring drivers behind the willingness to donate. Communication Methods and Measures 17(3): 227–249. https://doi.org/10.1080/19312458.2023.2176474

37.

Robertson

Green

Ruck

, et al. (2023) Users choose to engage with more partisan news than they are exposed to on Google Search. Nature 618(7964): 342–348. https://doi.org/10.1038/s41586-023-06078-5

38.

Ross Arguedas

Robertson

Fletcher

, et al. (2022) Echo chambers, filter bubbles, and polarisation: A literature review. Technical report, Reuters Institute for the Study of Journalism. https://doi.org/10.60625/RISJ-ETXJ-7K60

39.

Scharkow

Mangold

Stier

, et al. (2020) How social network sites and other online intermediaries increase exposure to news. Proceedings of the National Academy of Sciences 117(6): 2761–2763. https://doi.org/10.1073/pnas.1918279117

40.

Slater

(2015) Reinforcing spirals model: Conceptualizing the relationship between media content exposure and the development and maintenance of attitudes. Media Psychology 18(3): 370–395. https://doi.org/10.1080/15213269.2014.897236

41.

Smets

(2022) Designing for serendipity: A means or an end? Journal of Documentation 79(3): 589–607. https://doi.org/10.1108/JD-12-2021-0234

42.

Steiner

Magin

Stark

, et al. (2022) Seek and you shall find? A content analysis on the diversity of five search engines’ results on political queries. Information, Communication & Society 25(2): 217–241. https://doi.org/10.1080/1369118X.2020.1776367.

43.

Stirling

(2007) A general framework for analysing diversity in science, technology and society. Journal of the Royal Society Interface 4(15): 707–719. https://doi.org/10.1098/rsif.2007.0213

44.

Sunstein

(2017) #Republic: Divided Democracy in the Age of Social Media. Princeton University Press.

45.

Terren

Borge-Bravo

(2021) Echo chambers on social media: A systematic review of the literature. Review of Communication Research 9. https://doi.org/10.12840/ISSN.2255-4165.028

46.

Thurman

Moeller

Helberger

, et al. (2019) My friends, editors, algorithms, and I. Digital Journalism 7(4): 447–469. https://doi.org/10.1080/21670811.2018.1493936

47.

Trielli

Diakopoulos

(2019) Search as news curator: The role of Google in shaping attention to news information. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, pp. 1–15. https://doi.org/10.1145/3290605.3300683

48.

Trilling

(2024) Communicative feedback loops in the digital society. Weizenbaum Journal of the Digital Society 4(2). https://doi.org/10.34669/WI.WJDS/4.2.4

49.

Ulloa

Kacperski

(2024) Search engine effects on news consumption: Ranking and representativeness outweigh familiarity in news selection. New Media & Society 26(11): 6552–6578. https://doi.org/10.1177/14614448231154926

50.

Van Hoof

Meppelink

Moeller

, et al. (2022) Searching differently? How political attitudes impact search queries about political issues. New Media & Society 26(7): 3728–3750. https://doi.org/10.1177/14614448221104405

51.

Vermeer

Trilling

Kruikemeier

, et al. (2020) Online news user journeys: The role of social media, news websites, and topics. Digital Journalism 8(9): 1114–1141. https://doi.org/10.1080/21670811.2020.1767509

52.

Welbers

Loecherbach

Lin

, et al. (2024) Anything you would like to share: Evaluating a data donation application in a survey and field study. Computational Communication Research 6(2). https://doi.org/10.5117/CCR2024.2.5.WELB

53.

Wojcieszak

Menchen-Trevino

Goncalves

JFF

, et al. (2022) Avenues to news and diverse news exposure online: Comparing direct navigation, social media, news aggregators, search queries, and article hyperlinks. The International Journal of Press/politics 27(4): 860–886. https://doi.org/10.1177/19401612211009160

54.

Huang

, et al. (2023) Personalized news recommendation: methods and challenges. ACM Transactions on Information Systems 41(1): 1–50. https://doi.org/10.1145/3530257

55.

Zuiderveen Borgesius

Trilling

Möller

, et al. (2016) Should we worry about filter bubbles? Internet Policy Review 5(1). https://doi.org/10.14763/2016.1.401