Abstract
This study examines how the #MeToo movement reshaped long-term public discourse on sexual violence in South Korea. While research has documented the surge of attention following prosecutor Seo Ji-hyun’s 2018 disclosure, little is known about how the movement reorganized the broader discursive field beyond the initial moment of crisis. Drawing on discursive institutionalism, we argue that #MeToo acted as a discursive shock that consolidated previously fragmented conversations into more coherent and durable interpretive frameworks. Using an 8-year corpus of 351,582 Korean-language tweets (2015–2023), we apply Dirichlet Multinomial Regression topic modeling and time-series analysis to identify shifts in both topical content and discursive structure. Our findings show a transition from episodic, emotion-driven narratives to stable, thematic frames emphasizing human rights, systemic discrimination, and institutional accountability. We further demonstrate increasing convergence between social media and mainstream news, indicating diffusion and stabilization of these new frames. The results suggest that digital activism can produce enduring transformations in public meaning, illuminating how social movements institutionalize discourse over time.
Introduction
Public discourse on sexual violence in South Korea has undergone a profound transformation over the past decade, culminating in the watershed moment of the #MeToo movement in early 2018. While prior research has richly documented the surge of accusations, media attention, and public debate that followed prosecutor Seo Ji-hyun’s televised disclosure (JTBC News, 2018), less is known about how the movement reorganized the long-term discursive structure surrounding sexual violence. Existing computational studies of #MeToo tend to focus on short periods during or immediately after the movement’s emergence, capturing heightened visibility but not the deeper discursive realignments that shape public meaning over time. As a result, we still lack an account of how the movement altered the architecture of public discourse: what narratives stabilized, which frames receded, and how digital publics came to interpret new incidents within these reorganized frameworks.
This paper advances the argument that #MeToo in Korea did more than spark a temporary surge of attention: it introduced a new institutionalized discursive order around sexual violence. Drawing on theories of discursive institutionalism and agenda formation, we conceptualize the #MeToo moment as a discursive shock that absorbed pre-existing, fragmented conversations and reconfigured them into more coherent, stable, and structurally framed narratives. In this view, social media was not merely a channel for public reaction but a site where evolving “rules of meaning” solidified: where episodic outrage was transformed into thematic, institutionally grounded frames emphasizing power, human rights, and accountability.
This theoretical perspective complements and extends prior research on feminist digital activism, which has shown that hashtag movements can mobilize publics, challenge stigma, and elicit political responses. Over the past decade, feminist movements have attracted substantial public attention, facilitated in large part by the connective infrastructure of digital and social media platforms (Jackson et al., 2020; Keller et al., 2018). The hashtag, now a routine tool for signaling shared concerns and navigating vast streams of online content, has emerged as the defining mechanism of connective action on Twitter (Jackson et al., 2020). Feminist activists have drawn extensively on this mechanism: as noted above, campaigns such as #GirlsLikeUs (2012), #YesAllWomen (2014), #WhyIStayed (2014), #YouOKSis (2014–2015), and #EverydaySexism (2015) illustrate its widespread adoption. Each of these movements has cultivated forms of collective identity that parallel, in important respects, those traditionally associated with social movement organizations (de Almeida, 2019; Jackson et al., 2020; Mendes et al., 2018). Crucially, however, the interpretive resources these digital movements deploy, particularly the reframing of gender-based violence as a structural and rights-based issue, were not invented on social media. They draw on decades of transnational feminist advocacy that established “women’s rights as human rights” as a normative framework (Bunch, 2017) and on the cross-border activist networks through which such frames diffused into domestic policy arenas worldwide (Keck & Sikkink, 2014; True & Mintrom, 2001). Hashtag activism, in this sense, represents a new channel for the circulation of framing resources whose origins lie in longer histories of organized feminist mobilization.
We build on this work by demonstrating how such movements also generate long-term institutionalization of discourse, visible in the stabilization of thematic frames, the alignment of social media and mainstream news agendas, and the decline of volatile, event-driven narratives. Moreover, by tracing 8 years of Korean Twitter (now X) discourse from 2015 to 2023, we show how the movement reorganized collective memory: shifting public interpretation away from isolated scandals and toward structural understandings of gendered violence. Empirically, we analyze 351,582 tweets using Dirichlet Multinomial Regression (DMR) topic modeling combined with time-series analysis. This longitudinal approach allows us to examine not only what topics emerge, but how they persist, stabilize, or fade over time. We identify two overarching discursive formations, namely, personal/emotional narratives and public/structural frames, and show how their relative prominence changes before and after 2018. We further demonstrate that the #MeToo moment marks a pivot from fragmented, episodic discourse toward more institutionalized thematic frames, reflected in rising attention to human rights violations, legal accountability, and systemic inequalities.
By theorizing discursive institutionalization and providing computational evidence for its mechanisms, this study offers a new explanation for how digital activism reshapes public understandings of gendered violence—not only during moments of crisis but also across longer arcs of cultural change. The findings also contribute to broader discussions on how social movements reorder public meaning, demonstrating that hashtag activism can generate durable transformations in the discursive landscape.
Literature Review
The #MeToo Movement
The global #MeToo movement began in October 2017, when American actor Alyssa Milano used social media to expose sexual harassment and assault by film producer Harvey Weinstein. It quickly gained international traction as individuals shared their experiences under the hashtag #MeToo, raising awareness, challenging stigmatizing narratives surrounding sexual violence, and demanding action against sexual violence (Kim & Hong, 2019; Li et al., 2021; Manikonda et al., 2018; Rister & McClure, 2019; Yoon & Ahn, 2022).
Feminist media scholarship highlights how representations of sexual violence shape public understanding and policy responses. Studies of high-profile cases in Western contexts show persistent tendencies toward victim-blaming, personalization, and emphasis on reputational damage to accused men (Boyle, 2019; Cuklanz, 2020). Even in contexts with strong gender-equality norms, such as Scandinavia, media coverage has often framed sexual violence as individual misconduct rather than systemic inequality (Askanius, 2019). At the same time, feminist scholars emphasize that definitions of sexual violence are historically contingent and shaped by shifting cultural and political contexts (Heberle, 2009; Kelly, 1987). These insights underscore the importance of examining how movements such as #MeToo transform not only the visibility of sexual violence but also the interpretive frameworks through which it is understood.
Research in Asian contexts suggests that the movement’s effects are uneven and mediated by local political cultures. For example, Lee (2020a) found that while South Korean news coverage showed increased attention to survivors’ perspectives after #MeToo, broader gender biases in reporting persisted. Hasunuma and Shin (2019) compare the movement’s trajectory in South Korea and Japan, showing that #MeToo gained far greater traction in South Korea—where women came forward publicly, mass protests were mobilized, and concrete legislative reforms followed—than in Japan, where fewer women spoke out, many remained anonymous, and the movement remained largely confined to journalism. They argue that these divergent outcomes reflect differences in women’s engagement in civil society and the nature of media coverage: South Korean media gave victims a prominent platform that amplified the movement, whereas Japanese media coverage tended to reinforce victim-blaming narratives. In a subsequent analysis, Shin (2021) deepens this account by emphasizing the role of a new generation of young Korean feminists as the driving force behind the movement’s strength, showing how their collective experiences of online sexual exploitation and misogyny—crystallized by events such as the 2016 Gangnam Station femicide—made them powerful agents of social change who pushed the movement beyond individual denunciations toward demands for structural reform of gendered power relations. Together, these studies highlight a key tension in global #MeToo dynamics: while the movement promotes structural understandings of gendered violence, media systems, political institutions, and generational dynamics may shape whether and how these frames take hold.
Together, this body of work suggests that #MeToo operates not only as a moment of mobilization but also as a political process through which activists, media institutions, and publics contest the meaning of sexual violence. However, most studies focus on short-term reactions or individual cases. We still know little about whether and how these discursive shifts stabilize over time, particularly in digital public spheres where attention is often assumed to be volatile. Addressing this gap requires longitudinal analysis capable of tracing how movements reshape the discursive architecture through which societies interpret gendered violence.
#MeToo in Korea
In Korea, the #MeToo movement gained prominence in January 2018, when prosecutor Seo Ji-hyun publicly disclosed sexual harassment within the Prosecutors’ Office during a televised interview (Lee et al., 2019; Yoon & Ahn, 2022). This sparked widespread discourse on social media, with accusations against high-profile figures and a broader reckoning with systemic issues of sexual violence.
Yet, discussions of sexual violence on Korean social media predate the #MeToo movement. Social movements often build momentum from cumulative incidents, eventually catalyzed by specific events (Diani, 1992; Kang et al., 2023). The resurgence of feminist activism and gender conflicts—such as the emergence of the feminist online community Megalia in 2015, the 2016 Gangnam Station murder controversy, and the “#Sexual_Violence_in_OO_Sector” campaign starting in October 2016—laid the groundwork for the later movement in Korea (Gyeonggido Women’s Development Institute, 2018; Lee, 2018). While #MeToo is distinct in its focus on voluntary public allegations of sexual violence often tied to power dynamics, it is part of a broader continuum of gendered power struggles and revelations of sexual violence, echoing earlier campaigns. Therefore, examining both pre-2018 discourse and subsequent developments is crucial to understanding its broader significance.
To capture this continuity and change, this study examines how users’ perceptions of specific topics form and evolve through an analysis of short, hashtag-based messages on Korean Twitter. It analyzes an 8-year Twitter corpus (2015–2023) to explore how issues related to sexual violence and gender emerged and developed on the platform.
While prior studies focused primarily on early #MeToo discourse from late 2017 to 2018 (Bae et al., 2021; Baik et al., 2022; Bogen et al., 2022; Goel and Sharma, 2020; Kwak & Kim, 2019; Lee et al., 2019), this study offers a comprehensive longitudinal examination of how discourse on sexual violence has evolved from 2015 to early 2023. By comparing pre- and post-movement discourse on the same platform, it situates the movement within a broader social context of violence and misogyny in Korea. In doing so, it highlights not only the moment of rupture but also the longer trajectories of continuity, diffusion, and transformation in how Korea has discussed sexual violence over the past decade.
Previous quantitative research on the #MeToo movement often relies on social media or news article datasets or surveys. For research that analyzes Korean Twitter texts, Kwak and Kim (2019) compared tweets containing “#MeToo” across two separate 3-day periods: March 2018 and August 2018, using the Latent Dirichlet Allocation (LDA) method. They identified ten topics for each period and observed a shift from broad common interest to polarized views, with some users supporting the movement and others developing negative attitudes toward it. 1 Lee et al. (2019) conducted a network analysis of tweets generated from January 28, 2018, to March 24, 2018, focusing on tweets with high betweenness centrality. 2 They found that the tweet about a webtoon site incident, 3 which was not covered by mainstream media, received the most empathy, showcasing social media’s ability to independently set and spread agendas on issues of sexual violence and gender discrimination. They also identified key topics, such as calls for reinvestigation of past cases like the Jang Ja-yeon case, resistance to the dismissal of the movement as a “political maneuver” 4 or a form of “defamation,” 5 and concerns about the movement’s potential to hastily condemn innocent people and incite gender conflicts.
Some research, such as Baik et al. (2022), suggests that the #MeToo’s influence may be limited in reaching beyond the traditional core supporters. They compared tweets produced before (April 1 to October 8 of 2017) and after (October 21, 2017 to March 31, 2018) and found that while discourse on sexual violence increased significantly among supporters, it showed minimal increase among neutral users, indicating limited “spillover” effects.
Other studies have examined how responses to #MeToo differ across platforms. Bae et al. (2021) compared the “disparagement” frame in Korean Twitter discourse and online news comments from January 1 to March 31 of 2018. They found that while both platforms exhibited this frame, Twitter users expressed stronger empathy and support for the movement. Similarly, Manikonda et al. (2018) observed that while Reddit users (an English-language online community) focused more on sharing personal experiences, Twitter users expressed greater empathy and support through hashtags and the sharing of related articles.
Yoon and Ahn (2022) conducted a comprehensive analysis of 6,897 #MeToo-related news articles published between the movement’s start and June 2020, allowing for a relatively long-term analysis of dataset, larger than those social media studies mentioned above. Employing a TF-IDF keyword analysis and co-occurrence word network analysis, they identified four primary thematic clusters: challenges to gender-discriminatory social structures, social repercussions, exposure of victimization, and political responses. They argued that the presence of gender- and human rights-related words within the “challenges to gender-discriminatory social structures” cluster suggests that the media covered the #MeToo movement not just as personal experiences or isolated cases but as a broader societal structural issue.
Unlike most studies focusing mainly on the immediate aftermath of the movement, this study offers a comprehensive longitudinal analysis of responses to sexual violence on Korean social media spanning from 2015 to March 2023. By detailing data collection, processing, and modeling methods, the study enhances the reliability of its findings and offers a comparable foundation for future research. As the most extensive analysis of #MeToo discourse on Korean Twitter to date, it aims to validate and build upon earlier studies that examined shorter periods or different platforms.
Discursive Institutionalization
While prior studies document how #MeToo mobilized attention and shaped public conversations, fewer have examined how such movements restructure long-term discourse. To theorize this deeper transformation, we draw on scholarship in discursive institutionalism, which conceptualizes public discourse as a site where norms, interpretive schemas, and shared meanings become stabilized and embedded within social and political life (Schmidt, 2008, 2010). From this perspective, discourse is not simply expressive, that is, it does not only express the public’s emotions and reactions to events, but it is also constitutive, by shaping the very boundaries of what counts as a public problem and how it should be understood.
Discursive institutionalism emphasizes two processes highly relevant for hashtag activism. (1) Reframing, in which new narratives reinterpret existing issues through alternative lenses (e.g., power, rights, and systemic inequality). (2) Stabilization, in which certain frames persist, diffuse across media and institutions, and gradually become taken-for-granted interpretive norms.
These processes help explain why social movements often generate enduring cultural change even after visible mobilization wanes. In the case of #MeToo, the shift from individual scandals to structural critiques suggests that the movement functioned as a “discursive shock” that reorganized the interpretive field surrounding gendered violence. Rather than remaining episodic, conversations began to adopt thematic, institutionalized frames, echoing broader patterns identified in agenda-setting theory and public sphere research (Entman, 1993; Hilgartner & Bosk, 1988; Schmidt, 2010).
Digital activism research has begun to explore similar dynamics. Hashtag movements such as #MeToo, #BlackLivesMatter, and #NiUnaMenos provide “connective action” frames (Bennett & Segerberg, 2023), through which dispersed individuals generate shared meaning. Scholars note that these movements not only amplify marginalized voices but also produce new narrative infrastructures—common vocabularies, diagnostic frames, and interpretive horizons—that shape how subsequent events are described and understood (Jackson et al., 2020; Mendes et al., 2018). In feminist movements specifically, hashtags provide mechanisms for reframing personal experiences as systemic harms, enabling the networked feminist storytelling that Mendes et al. (2018) describe as blurring the boundaries between private and political.
Yet few studies have examined whether these frames stabilize over time or whether platforms like Twitter simply generate cyclical bursts of attention. Most #MeToo studies in Korea have focused on short-term spikes in discourse immediately following high-profile accusations (Bae et al., 2021; Baik et al., 2022; Kwak & Kim, 2019; Lee et al., 2019). These works effectively capture agenda-setting moments but leave unanswered the question of how public meaning changes over longer periods, particularly as emotional reactions give way to institutional critiques.
By integrating discursive institutionalism with long-term computational analysis, this study contributes to a growing body of research arguing that digital activism can produce durable reorganizations of public discourse (Jackson et al., 2020). We extend this literature by showing how Korean #MeToo consolidated previously fragmented conversations, absorbed earlier feminist and anti-misogyny frames, and produced thematic interpretations of sexual violence grounded in human rights and power structures. In this account, the institutionalization of new discursive norms proceeds through a sequence of interconnected processes: first, a discursive shock—such as Seo Ji-hyun’s public disclosure—disrupts existing interpretive routines and opens space for alternative framings; second, these new frames are taken up and reinforced across multiple sites of discourse, as social media users, news organizations, and civil society actors begin to adopt a shared vocabulary of structural critique; and third, through sustained repetition and cross-platform diffusion, these frames cease to be novel interventions and become the default interpretive lens through which new incidents are understood, effectively displacing the earlier pattern of episodic, scandal-driven reactions. The second step in this sequence deserves particular emphasis because frames do not sustain themselves: they require organizational carriers that maintain and diffuse them between moments of peak mobilization (Tarrow, 2022). In the Korean case, this carrier function was performed by a well-coordinated women’s movement with deep roots in the pro-democracy struggles of the 1980s and 1990s, whose organizations had long worked to reframe gender violence as a political and rights-based issue rather than a private matter (Moon, 2002). The density and multigenerational character of this civil society infrastructure helps explain why the structural frames catalyzed by #MeToo stabilized in Korea to a degree not observed in all national contexts. Through this lens, the Korean #MeToo movement becomes not only a moment of mass mobilization but also a mechanism for institutionalizing new discursive norms, reshaping how publics interpret sexual violence well beyond the movement’s peak visibility.
Methods
Topic Modeling
This study employs topic modeling, a computational method that integrates word frequency information from a global statistical perspective to provide a macroscopic understanding of the overarching thematic structure of texts (Arnold & Arnold, 2023; Badryzlova et al., 2021; Liu & Jin, 2020; Uglanova et al., 2020; Wang & Hsieh, 2023). Compared with more localized approaches such as collocation or keyword analysis, topic modeling offers a more comprehensive method for discourse analysis, uncovering multifaceted relationships between topics and allowing dynamic analyses of language use and topic interactions over time (Brookes & McEnery, 2019; Goel and Sharma, 2020; Hall et al., 2008; Marjanen et al., 2020; Viola & Verheul, 2020). Time series analysis of topic distributions further enables tracking of discourse evolution.
Data Collection and Pre-processing
To examine the evolution of discourse on sexual violence in Korea, tweets were collected from January 2015 to March 2023. A comprehensive keyword search was conducted, including terms and hashtags related to sexual violence, sexual harassment, and misogyny, such as “seongpoklyeok” (성폭력), “seonghuirong” (성희롱), and “yeoseonghyeomo” (여성혐오) (collectively referred to as “Sexual Violence” keywords hereafter). The keyword “misogyny” was included to reflect the attitudes and sentiments associated with cyber sexual violence and harassment cases in the early 2000s that underpinned Korea’s #MeToo movement (Kim, 2015). Lee et al. (2019: 108), for example, note that the movement can be seen as “women’s anger at the spread of misogynistic remarks in online spaces over the past decade and women’s solidarity to support the victims.” For the period after early 2018, additional keywords were included such as hashtags and text mentions of “#MeToo” and its Korean transliteration “mitu” (미투), “MeeToo,” “MeNext,” and “WithYou” and its Korean transliteration “wideuyu” (위드유), collectively referred to as “#MeToo” keywords henceforth.
Our keyword list was developed through a combined deductive and inductive process. We began with widely used general terms (e.g., sexual violence, sexual harassment, and misogyny) based on prior media and academic usage, and then added #MeToo-related keywords after 2018. The same core “sexual violence” keywords were applied to both pre- and post-2018 data to ensure comparability; after 2018, however, #MeToo terms increasingly subsumed other related keywords, as discussed below.
We acknowledge this approach means that some case-specific hashtags may not have been fully identified. However, this limitation applies to both periods. In practice, major post-2018 cases (e.g., the 2020 Seoul Mayor Park case and the 2022 military sexual violence case) were captured, probably because users typically used multiple hashtags combining case-specific tags with broader terms such as #MeToo or sexual violence. While it is not feasible to identify every possible case-specific keyword, our design captures major cases insofar as they were framed within the broader discourse of sexual violence.
The tweet dataset was collected in April and May 2023, using the snscrape library (https://github.com/JustAnotherArchivist/snscrape). Snscrape is an open-source tool for scraping data from social networking services (SNS). It interfaces with the platform’s web search endpoints rather than the official API. The tool submits keyword queries, retrieves tweets returned by Twitter’s search system, and iteratively follows pagination cursors to expand the result set. As these endpoints implement proprietary relevance ranking, spam and safety filtering, and depth limits, the resulting dataset represents a ranked and filtered subset of tweets matching the query, not a complete or random sample of all posts. Rather, it is a platform-mediated sample, meaning that content filtered by Twitter's internal moderation, safety, and ranking mechanisms, such as bot-generated posts, NSFW or policy-flagged material, and content from shadow-banned users, is unlikely to appear in our sample. In other words, snscrape samples what a user would actually see on Twitter. Consequently, rather than representing all tweets with the selected keywords, the corpus reflects platform-mediated visibility and should be interpreted accordingly in downstream analyses. To enhance data quality, the following preprocessing steps were undertaken to clean the tweets: Step 1. Removed tweets with irrelevant terms, identified through manual inspection of a sample dataset. Step 2. Eliminated duplicate tweets and those missing essential metadata (e.g., date and user information). Step 3. Removed hashtags, symbols, and very short tweets (fewer than five words). For counting words, we used the eojeol (word + affix) unit of Korean, which is separated by space. Step 4. Applied a BERTopic model to semantically cluster tweets and filter out outlier clusters unrelated to sexual violence or #MeToo.
BERT-based topic modeling, also known as BERTopic (Grootendorst, 2022), is a topic modeling approach designed to uncover latent themes within document collections (Grootendorst, 2022). Although traditional methods such as Latent Dirichlet Allocation (LDA) remain widely used in social science research, BERTopic has been shown to offer improvements in interpretability and accuracy. This advantage stems from its use of contextual embeddings derived from sentence transformers, which capture semantic meaning more effectively than earlier techniques (Egger & Yu, 2022). In contrast to traditional word embeddings, where each term is assigned a single, static vector, contextual embeddings adapt representations according to surrounding text. Encoder models such as sentence transformers and BERT extend this principle to larger units of language, such as sentences and paragraphs. BERTopic employs these embeddings to preserve semantic relationships across documents and then applies density-based clustering to form topic groups and generate interpretable topic representations. These components can be adjusted and fine-tuned to accommodate the specific characteristics and analytical needs of a given corpus (Grootendorst, 2022).
Number of Tweets in Each Step of Data Cleaning
Since the #MeToo-related keywords were not present before 2018, tweets posted before 2018 were collected using only the “Sexual Violence” keywords, while tweets after 2018 were collected with “#MeToo” keywords as well. The number of tweets with the “Sexual Violence” keywords showed a sharp decline from 163,919 before 2018 to 52,459 after 2018, suggesting that the #MeToo movement absorbed much of the prior discourse on sexual violence (with many users using both “#MeToo” and “Sexual Violence” keywords). To verify this, we built two topic models comparing two datasets: Model A, comprising “Sexual Violence” tweets from 2015 to 2017 and “#MeToo” tweets from 2018 to 2023, and Model B, including all “Sexual Violence” tweets (2015–2023) and “#MeToo” tweets. Figure 1 shows the temporal evolution of two key topics, “Misogyny and Feminism” and “#MeToo in Entertainment,” in both models (the y-axis shows the topic proportion for each year). Topic_03 from Model A aligns closely with topic_02 from Model B, and topic_22 in Model A matches topic_16 in Model B, exhibiting nearly identical trends. Given the similarity in the flow of the topics across both datasets, which supports the hypothesis that the #MeToo movement absorbed much of the previous discourse on sexual violence, we decided to opt for Model A for subsequent analysis. Temporal Evolution of Example Topics in Model A and Model B (the Red Dot Line Indicating 2018)
Cleaned tweets--that is, tweets with hashtags removed, as part of the four preprocessing steps described above--were then analyzed using a Korean morphological analysis tool to break down words into their constituent morphemes, such as roots, prefixes, and suffixes, and to identify their parts of speech. Morphological analysis was conducted using the Kiwi library, 6 and a custom user dictionary was created and applied to address OOV (out-of-vocabulary) terms. The text was subsequently reconstructed to focus on content words such as nouns, verbs, adjectives, and adverbs. The keywords like “#MeToo” and “sexual violence,” which were used for tweet collection, were treated as stopwords 7 and excluded.
Topic Modeling With DMR (Dirichlet Multinomial Regression)
The cleaned corpus was then analyzed using a topic modeling algorithm. Topic modeling is an unsupervised machine learning technique that identifies hidden thematic structures within a large collection of documents. By analyzing word co-occurrence patterns, it clusters related terms into topics. As mentioned above, the most popular topic modeling algorithm is Latent Dirichlet Allocation (LDA), an algorithm that assumes each document is a mixture of multiple topics with specific word distributions. LDA estimates these distributions probabilistically, revealing the underlying thematic composition of the document corpus (Blei et al., 2003). This study employed the Dirichlet Multinomial Regression (DMR) model, an extension of LDA, to better understand the relationship between metadata variables and the distribution of topics within documents (Mimno & McCallum, 2008). Similar to Structural Topic Models, the DMR model can incorporate the metadata variables, such as date, to estimate topic distribution over time (Heo, 2019; Kim et al., 2023). 8
To determine the optimal number of topics (k), we tested 27 models with k-values from 3 to 30, assessing topic coherence using four metrics as suggested by Röder et al. (2015), here named as c_uci, c_npmi, c_v, and u_mass. c_uci uses Pointwise Mutual Information (PMI) for word pair co-occurrence. c_npmi is the normalized PMI, ranging from −1 to 1. c_v is a composite measure evaluating coherence by considering both word pair co-occurrence and connections between words, and u_mass measures topic coherence using word co-occurrence. Higher c_uci, c_npmi, and c_v values, along with lower u_mass values, indicate better topic coherence. Figure 2 shows the trends of these metrics across different k-values. Based on the metric scores and manual inspection of the top 30 words per topic, k = 25 was selected as optimal. How the Different Metrics are Affected by the Number of Topics k
Time Series Analysis
To analyze the evolution of topic distribution from January 2015 to March 2023, we conducted a time series analysis. Each tweet is modeled as a mixture of 25 topics (k = 25), and the topic proportions (θ) were combined with tweet dates to create a time series dataset suitable for statistical analysis. To analyze the temporal dynamics of topic distribution, we used Facebook’s Prophet model (Taylor & Letham, 2018), 9 which captures trends, seasonality, and holiday effects while handling non-linear trends and detecting outliers. Applying Prophet to the topic proportions from the DMR model allowed us to identify and visualize temporal patterns in #MeToo discourse.
Analyses and Findings
Trends in Monthly Tweet Volume: A Comparison With News Articles
We analyzed monthly tweet volumes from 2015 to March 2023. For comparison, we also examined articles from 22 news services, covering the same period and search terms. These services included 11 national dailies, 4 regional dailies, 5 TV broadcasters, and 2 economic newspapers, obtained via the BIGKINDS service.
10
The full list of sources is available in the Supplemental Information. Min-max normalization was applied to both datasets for visualization, as shown in Figure 3. Monthly Volumes of Tweets and Newspaper Articles
The peak volume of both tweets and newspaper articles occurred in March 2018, following prosecutor Seo Ji-hyeon’s public disclosure of sexual harassment within the Prosecutors’ Office at the end of January, which ignited the spread of #MeToo accusations across various sectors in Korea. The Pearson correlation coefficient between tweet and newspaper article data from 2015 to March 2023 was 0.92 (p < 2.2e-16), indicating a high overall correlation.
However, before January 2018, the correlation was low (r = 0.104, p = 0.5706), indicating almost no correlation between the two media. For example, in August 2015, media articles spiked at 0.29 in Figure 3 due to the TV coverage of the “Mother and Sons’ Sexual Assault Case” 11 (40 out of 1,566 articles directly addressing it), but tweet volume remained little changed at 0.06 (1 direct mention out of 4,012 tweets). Conversely, in December 2016, tweets rose to 0.20 with tweets about misogyny and gender issues comprising 20.1% (2,288 out of 11,362 tweets). By comparison, media articles were lower at 0.12, with only 13.8% (89 out of 646 articles) covering the issue, highlighting different focal points between the two media.
In contrast, after January 2018, the correlation increased to 0.98 (p = 2.2e-16), reflecting almost identical patterns between the two media. The correlation remained high at 0.84 (p = 2.2e-16) even after excluding the February to April 2018 #MeToo peak. In July 2020, both tweets and articles surged simultaneously, coinciding with the #MeToo accusation against Seoul Mayor Park Won-soon and his subsequent suicide. The only exception is a divergence in September 2018, when tweets surged during the “#School_MeToo” movement while article volume decreased slightly.
The heightened alignment of Twitter and legacy media after 2018 can be attributed to prosecutor Seo’s public #MeToo disclosure, which transformed the issue into a nationwide concern. This is evident in the increased sharing of news links in tweets from January 2018 to March 2020, as shown in Figure 4. Before 2018, sexual violence discussions on Twitter were more fragmented, less publicized, and lacked widespread empathy. Monthly Distribution of URLs Included in Tweets
Topic Modeling Analysis
Topic Terms and Titles
The 25 topics were validated by manual inspection of a sample of tweets from each topic, to confirm interpretability and coherence. This strengthens confidence in the topic classifications and supports the aggregation into broader thematic categories. Due to the informal nature of Twitter discourse, conversational markers remain in many topics. However, this does not affect topic labeling, as topic labels were assigned during the manual inspection step, based on the semantic coherence of the tweets assigned to each topic.
Figure 5 illustrates the proportion of each topic within the total dataset, with all 25 topics summing to 100%. The topic proportions were calculated based on topic composition within each tweet. Seven major topics, each accounting for more than 5%, together constitute 54% of the total content, representing dominant discourses. Proportions of 25 Topics
The most prominent topics, as shown in Figure 5, are topic_08 (10.14%), “Emotional Responses to Sexual Violence/#MeToo with Profanity,” and topic_02 (9.73%), “Personal Reactions to Everyday #MeToo/Sexual Violence Cases,” which underscores Twitter’s role in openly expressing personal emotions and fostering public discussions on #MeToo. Topic_10 (7.10%), “#MeToo/Sexual Violence in Politics and Elections,” reflects the political impact of the movement, including allegations of prominent politicians. This issue is linked to the allegations against and suicide of the high-profile politician Seoul Mayor Park Won-soon, as well as earlier suicide cases involving famous actors and a female victim of military sexual violence, forming topic_07 (6.65%), “Controversy and Responses to Suicides Related to #MeToo/Sexual Violence.” Topic_03 (8.74%) and topic_17 (5.38%) address sexual violence from perspectives of misogyny, gender conflict, and feminism. Finally, topic_09 (5.16%), “Physical Accounts of Specific Personal and Others’ Experiences,” highlights personal accounts of direct and indirect stories, often absent from traditional media.
Next, we used the KLD (Kullback–Leibler Divergence) to assess distances between topics based on word and topic distributions, and grouped similar topics using hierarchical clustering. For each pair (x,y) of topics, we calculated their difference as KLD(x,y) + KLD(y,x), to account for the asymmetry in the definition of the KLD. Figure 6 illustrates the clustering results. Heatmap Showing How Topics Cluster According to Topic Similarity
Topic Clusters
Group 1, covering about 56.4% of the total topics, focuses on “Social and Public Perspectives on Sexual Violence and the #MeToo Movement,” consisting of 16 topics clustered into six major TOPICs: “Media Coverage, Online Sharing, and Reactions” (TOPIC1), “Allegations, Support for #MeToo, and Civil Activism and Solidarity” (TOPIC2), “Cases Involving Politicians and Societal Reactions” (TOPIC3), “Sexual Violence from Gender Perspective and Feminism as Resistance” (TOPIC4), “Controversies over Legal Penalties” (TOPIC5), and “Testimonies in Schools and Calls for Prevention” (TOPIC6). In contrast, Group 2, excluding two topics with unclear themes (TOPIC10), focuses on “Personal Experiences and Emotional Responses,” covering seven topics clustered into four major TOPICs, accounting for 43.6%. This Group addresses “Reactions to Cases and Awareness of Issues in Everyday Life and Popular Culture” (TOPIC7), reports of direct and indirect experiences of sexual violence/#MeToo in workplaces, universities, and schools (TOPIC8 and TOPIC11), and personal emotional reactions (TOPIC9).
Changes in Topic Distribution Over Time
Differences in Topic Distribution Before and After 2018
In Group 1 (“Social and Public Perspectives”), 11 out of 16 topics (68.8%) showed a change rate of over two times, while in Group 2 (“Personal Experiences”), only one out of seven topics (14%) did. Notably, the three topics in TOPIC3, “#MeToo Cases Involving Politicians and Societal Responses,” all saw changes greater than 2.5 times. This suggests that in the post-#MeToo movement, Twitter users shifted towards viewing the movement through a social and political lens, rather than focusing on personal experiences and emotional reactions.
To analyze these changes dynamically, we examined the topic distribution per tweet over time using time series analysis. The x-axis in each graph represents the year (with the red line marking 2018), and the y-axis shows the proportion of the topic. This analysis reveals four main patterns of change over time, as demonstrated in Figure 7. Types of Temporal Changes in Topic Distribution (Red Line Indicating 2018)
Interestingly, topics with similar themes do not always follow the same temporal patterns. For example, topic_03, topic_17, and topic_18 are grouped under “Sexual Violence from Gender Perspective,” but while topic_03 and topic_17 align with Pattern 1, topic_18 follows Pattern 2. This indicates a shift in framing from gender conflict or feminism (topic_03 and topic_17) to a focus on gender-based power relations and human rights (topic_18) after the 2018 #MeToo movement. Similarly, among the three topics under TOPIC1, “Media Coverage, Online Sharing, and Reactions,” topic_06 and topic_24 follow Pattern 2, peaking after 2018, while topic_00 aligns with Pattern 4, showing steady growth. This division suggests that initial reactions to media coverage and #MeToo disclosures evolved into a broader focus on sharing personal experiences and supporting victims, reflecting a growing recognition of sexual violence as an issue rooted in power dynamic and increasing solidarity with victims.
Furthermore, topics associated with Pattern 4, such as topic_00 (Support via Online Sharing), topic_15 (Gender Issues in Popular Culture), and topic_20 (Collaborative Actions for Women’s Rights), reflect growing support, solidarity, and shifting perceptions. These range from low-stakes support, via simply sharing relevant online articles, as captured by topic_00, to active criticism of sexist perspectives reflected in popular culture (topic_15) and participation in human rights civil activities (topic_20). This highlights the spread of diverse forms of support and solidarity, aligning with research indicating that Twitter fosters stronger empathy and support compared to other media (Bae et al., 2021; Manikonda et al., 2018) and that the #MeToo movement contributed to shifting public attention from perpetrators toward empathy for victims (Choi & Lee, 2020). Specifically, the increased awareness and public discussion of sexual violence in popular culture (topic_15) signals a broader societal shift towards recognizing sexual violence against women, with #MeToo sparking awareness and sensitivity reflected in daily life. Topic_20, relating to solidarity activities, showed initial growth followed by a decline and later resurgence. This contrasts with research suggesting that, for Twitter users who were uninvolved in the #MeToo movement—what Baik et al. (2022) call a “Neutral Sample of Twitter users”—sexual assault and harassment were not major topics of discussion on Twitter before #MeToo and increased only slightly afterward. Rather, our results align more closely with findings that Twitter support increased significantly in March 2018 (Bae et al., 2021). Unlike these short-term analyses, this study observed trends over 8 years, showing a sustained shift in perceptions and activities toward solidarity and empathy for victims in the post-#MeToo period.
TOPIC3, “#MeToo Cases Involving Politicians and Societal Reactions,” also shows a shift in perceptions. Initially, attention centered on accusations and conspiracy theories involving progressive politicians (topic_23). Over time, this focus waned, while recognition of #MeToo incidents as crimes and subjects of political scrutiny (topic_07, topic_10) grew, peaking around 2021-22 before tapering off. This pattern reflects the movement’s role in driving lasting changes in perceptions of sexual violence.
Lastly, topics under TOPIC5, “Controversies over Legal Penalties,” follow either Pattern 1 (topic_14) or Pattern 3 (topic_19 and topic_21), with little change or a decline in tweet volume after 2018 (Table 4). This trend contrasts with a rise in reported sexual crimes and subsequent legal actions (Choi & Lee, 2020). Paradoxically, increased disciplinary measures and legal penalties have coincided with reduced calls for such actions or preventive measures. Prior research shows a surge in sexual crime reports and counseling post-#MeToo, alongside expanded disciplinary measures by public institutions (Choi & Lee, 2020). Notably, the scope of reporting has expanded beyond physical contact to include verbal sexual violence (Choi et al., 2018), signifying broader shifts in societal perceptions of sexual crimes after the #MeToo movement.
Discussion
Taken together, these results reveal a transition from fragmented and event-driven discourse to a stable, institutionalized set of frames that structure ongoing public meaning. The central theoretical contribution of this study is the argument that the #MeToo movement in Korea produced a form of discursive institutionalization: a reconfiguration of public meaning in which fragmented, episodic conversations about sexual violence were reorganized into coherent, stable, and structurally oriented frames. Our longitudinal analysis provides several lines of evidence for this process.
First, the alignment between social media and mainstream news agendas after 2018 demonstrates discursive consolidation. Before #MeToo, tweet and news article volumes were weakly correlated, reflecting distinct discursive spheres. Twitter users focused on misogyny, everyday grievances, and emotionally charged accounts, whereas news media emphasized high-profile or sensational cases. After Seo Ji-hyun’s disclosure, however, these two spheres converged dramatically. This indicates that #MeToo introduced a widely shared interpretive schema through which both publics and media organizations framed incidents of gendered violence. Such alignment is a hallmark of institutionalization: the diffusion and stabilization of shared meaning structures across platforms, audiences, and sectors.
Second, the topic modeling results show a clear shift from event-driven discourse to thematic, structural framing. Event-driven discourse refers to short-lived spikes in attention tied to specific incidents, such as celebrity scandals, individual accusations, or sensational media coverage. In the pre-2018 period, this discourse was characterized by episodic outrage, gender conflict rhetoric, and emotionally charged reactions—often expressed through profanity or polarized exchanges (e.g., topics related to misogyny debates, scandal-driven reactions, and emotional responses). These conversations tended to dissipate once media attention shifted, leaving little sustained engagement with underlying causes.
Following the 2018 #MeToo movement, however, discourse increasingly adopted thematic structural frames that interpreted individual cases as manifestations of systemic inequality. Topics emphasizing human rights, gendered power relations, legal accountability, and institutional responsibility expanded and persisted over time. Rather than focusing solely on particular perpetrators or incidents, users began linking cases to broader issues such as workplace hierarchies, cultural norms, and failures of legal protection. This shift from volatile emotional narratives to stable thematic interpretations parallels transitions described in agenda-setting and discursive institutionalist theory, in which issues evolve from personalized grievances to recognized “public problems” embedded in normative and institutional frameworks. The rising prominence of TOPIC4 (“Sexual Violence from Gender Perspective and Feminism as Resistance”) and TOPIC5 (“Controversies over Legal Penalties”), addressing gendered power relations, human rights violations, and legal handling of cases, illustrates how #MeToo enabled Korean publics to reframe sexual violence as a systemic injustice rather than a series of isolated events.
Third, the decline in volatility and the emergence of sustained frames signal the formation of new discursive norms. Several topics show consistent growth or stable prominence across multiple years, even in the absence of major triggering events. Topic_20 (Collaborative Actions for Women’s Rights) and topic_15 (Gender Issues in Popular Culture) are notable examples. Their persistence suggests that the interpretive frameworks catalyzed by #MeToo continued to structure how users understood gendered power long after the movement’s initial momentum. This stability marks a shift from episodic to thematic memory: the movement not only reframed immediate events but also reshaped the cognitive schemas through which publics recall, interpret, and classify new incidents.
Fourth, the restructuring of topics in the political domain illustrates institutionalization’s reach beyond feminist circles. Topics relating to political accountability (topic_07 and topic_10) rise sharply after 2018, surpassing earlier conspiracy-oriented discourse (topic_23). This indicates that #MeToo not only exposed individual perpetrators but also institutionalized a norm in which political figures are accountable for gendered misconduct. The discursive move from rumor and conspiracy toward procedural scrutiny (“investigation,” “evidence,” “legal responsibility”) reflects the embedding of normative expectations about power and gender into political discourse.
Finally, the temporal patterns reveal how #MeToo absorbed and reorganized prior discourses. Our comparison of Model A and Model B demonstrates that the movement did not emerge from nowhere; rather, it integrated earlier conversations around misogyny, school-based sexual violence, and feminist activism. After 2018, these once-separate discourses converge into a more unified system centered on structural power and rights. This “absorption and reconfiguration” dynamic is a key mechanism of discursive institutionalization: new shock events reorganize how older narratives are remembered, linked, and interpreted.
We acknowledge that our analysis demonstrates discursive shifts but cannot fully isolate the causal mechanisms driving them. The reframing of sexual violence in structural and rights-based terms on Korean Twitter likely reflects the convergence of multiple forces beyond the #MeToo movement itself. International developments, including the global diffusion of #MeToo, as well as the broader mainstreaming of human rights and gender equality discourse, may have provided Korean users with new vocabulary and interpretive resources when discussing sexual violence. Domestically, the sustained advocacy of human rights and gender rights organizations likely contributed to the receptivity of Korean publics to thematic, rights-based frames when #MeToo arrived. As Shin (2021) documents, a new generation of Korean feminists—radicalized by events such as the 2016 Gangnam Station femicide and the hidden camera protests—were already mobilizing around structural critiques of gendered power before Seo Ji-hyun’s testimony, suggesting that #MeToo accelerated and consolidated a discursive shift that was already underway rather than creating it from nothing. Our computational approach captures the timing and pattern of these shifts but cannot disentangle the relative contributions of the movement itself, longstanding activist labor, generational change, or transnational diffusion. What the data do show, however, is that regardless of the precise mix of causes, the post-2018 period marks a measurable and durable reorganization of public discourse—one in which these various forces converged to produce a more stable and structurally oriented discursive field than had existed before.
Conclusion
This study examined 8 years of Korean Twitter discourse on sexual violence to evaluate how the #MeToo movement reshaped public meaning. By applying topic modeling and time-series analysis to 351,582 tweets, we identified major thematic clusters and traced their evolution across pre- and post-2018 periods. Our findings support the theoretical claim that #MeToo produced a form of discursive institutionalization, shifting public discussion from fragmented, episodic accounts toward stable, thematic frames emphasizing human rights, gendered power structures, and political accountability.
This transformation is visible in several empirical patterns: the convergence of social media and news agendas, the decline of emotional and conflict-oriented topics, the rise of structural and institutional frames, and the persistence of new narratives across multiple years. Together, these patterns illustrate how digital activism can reorganize discursive fields, generating durable changes in how societies conceptualize gendered violence. The Korean case demonstrates that hashtag movements do not merely amplify existing grievances—they can reconstitute the frameworks through which publics understand social problems, creating new norms, expectations, and repertoires of interpretation.
Although the empirical focus is on Korea, the mechanisms likely operate in other digital public spheres. Studies of #MeToo in India, Japan, and the U.S. show similar struggles over frame stabilization, suggesting the broader relevance of discursive institutionalization as a model for understanding digital activism.
It is also worth reflecting on the fact that the platform examined in this study has itself undergone a significant transformation. Following Elon Musk’s acquisition, Twitter was rebranded as X, which was accompanied by changes to content moderation policies, algorithmic curation, and researcher data access. These shifts raise important questions about the durability of the discursive institutionalization we have documented: our findings show that Twitter served as a critical site where fragmented narratives were consolidated into stable, structurally oriented frames, yet the institutional character of the platform itself was a condition of that process. Changes to moderation norms and user demographics under X may alter the discursive environment in ways that reinforce or erode the interpretive frameworks established during the #MeToo era. Future research should investigate whether the transition to X has disrupted these discursive norms or whether the institutionalized frames proved resilient enough to survive the platform’s own institutional upheaval.
Our findings also highlight the importance of long-term, computational approaches to studying digital activism. Whereas short-term analyses capture immediate reactions, they may miss the processes through which movements stabilize meaning and institutionalize discourse. By integrating 8 years of data, this study reveals that #MeToo’s most significant impact was not the intensity of public reaction but the durability of its reframing effects. The movement shifted Korea’s discursive landscape from focusing on individual perpetrators toward recognizing sexual violence as a systemic issue rooted in power and inequality.
The political significance of these findings becomes visible once the Korean case is situated within the global trajectory of #MeToo. Across multiple contexts, the movement has challenged stigmatizing narratives, increased reporting of sexual violence, and influenced policy debates, yet its discursive effects have varied depending on media systems and political cultures. In South Korea, the strength of these effects is inseparable from the country’s distinctive civil society landscape: decades of feminist organizing—from the pro-democracy movements of the 1980s, through advocacy for former comfort women in the 1990s, to the Candlelight Revolution and anti-femicide protests of the 2010s—created a dense organizational infrastructure and a multigenerational tradition of collective action that the #MeToo movement could draw upon (Hasunuma & Shin, 2019; Shin, 2021). Moreover, the rights-based framing that emerged so prominently in post-2018 discourse echoes the broader transnational feminist project of recasting women’s rights as human rights (Bunch, 2017), suggesting that Korean Twitter users were not inventing these frames in isolation but drawing on interpretive resources circulated through global feminist networks (Keck & Sikkink, 2014). Our results suggest that in South Korea, #MeToo not only amplified survivor testimonies but also contributed to a durable shift in how sexual violence is publicly understood. The transition from episodic, scandal-driven reactions to thematic frames emphasizing human rights, institutional accountability, and systemic inequality mirrors global efforts by feminist activists to reframe sexual violence as a structural problem. These findings reinforce the view of #MeToo as a political process of discursive transformation, in which digital activism, media coverage, institutional responses, and longstanding civil society mobilization interact to reshape public meaning.
Several directions for future research follow from these findings. Extending this framework to other national contexts—particularly those with different civil society traditions, media ecosystems, or histories of feminist mobilization—would help clarify which features of discursive institutionalization are specific to the Korean case and which generalize across digital public spheres. Comparative work across platforms is equally important: as feminist discourse migrates between Twitter/X, Instagram, YouTube, and domestic platforms such as NAVER, the mechanisms of frame stabilization and diffusion may operate differently depending on platform affordances, moderation regimes, and user demographics. Methodologically, pairing the computational approach adopted here with qualitative discourse analysis would offer a richer account of the processes through which digital movements translate online discourse into lasting cultural and political change. More broadly, this study demonstrates that the most consequential effects of hashtag activism may be the ones least visible in real time: not the initial surge of public attention, but the quiet restructuring of the interpretive frameworks through which societies come to understand inequality long after the hashtag has stopped trending.
Supplemental Material
Supplemental Material - From Fragmented Narratives to Systemic Critiques: Long-Term Transformations in Korean Twitter Discourse on Sexual Violence
Supplemental Material for From Fragmented Narratives to Systemic Critiques: Long-Term Transformations in Korean Twitter Discourse on Sexual Violence by Ji-Myoung Choi, Hye-Won Choi, and Chico Q. Camargo in Social Science Computer Review
Footnotes
Funding
This research was supported by the Ewha Frontier 10-10 Project (1-2023-0154-001-3) and Global - Learning & Academic research institution for Master's·PhD students, and Postdocs (G-LAMP) Program of the National Research Foundation of Korea (NRF) grant funded by the Ministry of Education (No. RS-2025-25442252).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Notes
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
