Abstract
Algorithms are increasingly deployed as a frontline defense against digital extremism, yet their true effectiveness remains poorly understood. This review analyzes a decade of research (2013–2023) on algorithms for content detection and moderation. While researchers have reported high effectiveness for detection tools, we find this research constrained by a reliance on limited, platform-specific datasets. Beyond assessing current tools, we propose a mechanism scheme for using algorithms to interrupt the radicalization process. We conclude by outlining a new research agenda, calling for transparent, collaborative partnerships between social scientists and engineers to better detect and moderate extremist content and to ensure that the people who own the code are held accountable for the digital reality they create. To implement this agenda, social scientists should lead independent audits of moderation outcomes, while engineers should integrate accountability by embedding design principles into the early stages of algorithmic development. Furthermore, funders should mandate public–private transparency agreements as a prerequisite for research and development support.
Keywords
Most people today share a fundamental aversion to violence. 1 However, this stance can be undermined by specific political or ideological frameworks that justify politico-ideological violence. 2 These justifications are often categorized as extremist because they sit at the far edges of a population’s range of beliefs. 3
Actors rely on propaganda to bridge the gap between the adoption of extremist views and the commission of real-world violence due to moral hesitation or lack of tactical knowledge. At its core, propaganda involves a deliberate attempt to bypass an audience’s ability to think critically or deliberate freely, instead using persuasion techniques to secure a propagandist’s specific, desired response. 4 Essentially, it is communication designed to win over an audience by suppressing free thought.
How people communicate is shaped by the media they use. Today, digital communication platforms, the vast networks of software and devices that host social interactions, have become the primary battlegrounds for extremist propaganda.5–7 While these platforms allow propagandists to spread messages with unprecedented speed, they also provide the tools for fighting back: algorithms. Every communication medium operates under a specific media logic; that is, a set of technical rules and cultural norms that dictate what is communicated and how. 8 In the digital age, these logics are increasingly encoded into algorithms—sets of rules that, when run on computers, are designed to solve specific problems nearly instantaneously. 9 As these powerful tools transform how people experience information, they have also become the front line in detecting and countering propaganda shared on digital communication platforms.
The following PRISMA-guided scoping review moves beyond a simple census of technology to critically map the landscape of counterpropaganda algorithms. Specifically, we categorize the current state of research into two primary functional areas: content detection algorithms (CDAs) and content moderation algorithms (CMAs). Based on a synthesis of these findings, we propose a novel mechanism scheme that traces the hypothesized causal process through which algorithms may interrupt digital radicalization and thereby disrupt pathways to violence. Overall, we argue that the efficacy of these tools is fundamentally bound by the algorithm designer’s personal definitions, data training choices, and inherent biases, limitations reflected in part by the field’s disproportionate focus on Islamist extremism at the expense of other ideological contexts. Readers can expect this work to clarify the limited scope of current science in this area and offer a strategic roadmap for social scientists and developers to collaborate on more inclusive, transparent, and multifaceted algorithmic interventions.
Key Concepts: Counterpropaganda Tools
While the term radicalization is often used broadly, it generally describes a process whereby an in-group begins to view others with increasing hostility, ultimately endorsing violence against them. 10 These extremist views can influence individuals even when acting alone, provided they feel connected to a larger group or movement. 11 Because radicalization relies heavily on communication, propaganda plays a central role in the process. Today, extremist groups have moved beyond traditional methods to spread propaganda, using social media, encrypted messaging apps, and even video games to spread their messages.12,13 In response, researchers and technology developers are increasingly turning to algorithms as a vital tool for detecting and disrupting this digital pipeline.
CDAs serve as the digital eyes of a platform, identifying extremist material amid vast amounts of data. Some CDAs use text mining to spot specific patterns or keywords, 14 while more advanced systems apply machine learning to improve their accuracy over time. Rather than just following a static list of banned words, these systems analyze existing data to predict and identify new patterns of propaganda.15,16 Once content is detected, content moderation determines the response, which involves enforcing participation standards and norms on digital platforms. 17 While human moderators traditionally handled this work by removing posts or banning users, it is increasingly managed by content moderation algorithms (CMAs). These tools automatically classify user-generated content and trigger a governance outcome, such as deleting a post or deactivating a user’s account based on the platform’s policies. 18 Together, CDAs and CMAs form the technical backbone of modern counterpropaganda efforts on digital communication platforms.
Methodology
To identify the 33 studies included in this scoping review (29 focusing on CDAs and four on CMAs), we conducted a systematic search on Google Scholar from February to April 2024. The search targeted peer-reviewed journal articles published in English between 2013 and 2023 that provided empirical assessments of tools used to counter extremist propaganda. Following established protocols for Google Scholar-based reviews, we screened 300 total retrievals across 30 pages of results, continuing the search until the relevance rate fell below a 0.3 threshold (i.e., less than 30% of subsequent page results presented relevant studies). Two independent reviewers screened the initial results for eligibility, resolving discrepancies through consensus to ensure the final sample met strict criteria for methodological transparency and operationalization of extremist content. For a detailed description of the search strings, specific inclusion/exclusion criteria, and the PRISMA-style selection flow, please refer to the Supplemental Materials.
Tool 1: Content Detection Algorithms
Our analysis included 29 core studies that met strict criteria for evaluating content detection algorithms. We found a significant imbalance in the current state of science on this subject. While extremist threats are diverse, nearly 76% of relevant research focused exclusively on Islamist extremism, leaving other forms, such as right-wing extremism, critically understudied. The theme of imbalance was mirrored in the researchers’ expertise: Nearly three quarters of the authors came from computer science and engineering, while only 12% represented fields like criminology or security studies. This suggests that, while powerful technological tools are being developed, the social and behavioral expertise needed to understand the why behind the propaganda these tools are meant to address may be missing.
Furthermore, the science is linguistically limited; despite the global nature of the problem, 74% of the studies focused on English-language content, with far less attention paid to other languages, like Arabic. Most of these studies employed text analytics (using computers to recognize patterns in text to predict future behavior), making a tool’s familiarity with a written language critical. A heavy reliance on government funding (42%) and the concentration of research in only a few nations suggests that the current science of extremist content detection is shaped by a very narrow set of national security priorities.
What Do We Know?
While the detection tools we reviewed are relatively new, they are rapidly becoming more sophisticated. Early versions often struggled with the messy reality of human speech, but newer algorithms are learning to account for the nuances of digital communication. This includes tracking hashtags, hyperlinks, and even disguised words or intentional typos designed to evade detection. 19 Crucially, research is beginning to show a clear link between the rise of extremist content online and actual terrorist activity in the physical world. 20
Beyond just spotting keywords, modern detection science is expanding in three key ways:
tracking actors and tactics: Algorithms are now being used to identify specific terrorists and their supporters,21–24 recognize recruitment strategies, 25 and flag lone wolf actors or individuals at high risk of committing violence.26,27
cultural and religious nuance: There is a growing effort to teach algorithms the specific cultural and religious language used by different groups, moving beyond a one-size-fits-all vocabulary to better understanding of a message’s context.28–30
exploring the shadows: Researchers are moving past mainstream social media to test these tools in dark web forums25,31 and to identify the automated accounts (bots) often used to amplify extremist propaganda. 22
Issues & Limitations
A significant limitation in the current field is lack of collaboration between the people building detection technology and the people studying human behavior. Most research on detection algorithms is conducted by experts in STEM. While these developers are technically skilled, their studies often fail to clearly define what extremism actually looks like or provide a theory of radicalization to guide the algorithms’ development. Without input from social scientists, these tools often lack context-aware features, 31 which may mean they struggle to distinguish between a dangerous extremist and someone simply discussing a controversial topic. For instance, while an algorithm might flag the word martyrdom as a high-risk indicator of Islamist extremism, a social scientist can provide the theological and linguistic context to distinguish between a devotional religious discussion and a tactical call for political violence. When research is not grounded in the knowledge of the human and social dimensions of countering violent extremism, it risks producing powerful tools that do not actually solve the social problem they were intended to address. 23 For detection to be effective, the algorithm’s logic must be informed by social science knowledge.
Data bottlenecks are another major hurdle in this field. Because algorithms learn by example, they require annotated databases; that is, large collections of data where human experts have already labeled what is extremist and what is not. 20 Most of the available datasets used to test and evaluate CDAs focus almost exclusively on Islamist extremism,20,23,30,32 which is rarely acknowledged as a research limitation. However, this limitation might be partially attributed to the reliance on publicly available datasets, which creates a significant blind spot: Because researchers rely on these limited datasets to train and test their tools, the resulting algorithms may be highly effective at spotting one type of threat while remaining blind to others, such as the far right. Furthermore, this bias is compounded by language. Researchers in English language-dominant nations frequently overlook non-English data, which severely limits detection tools in their ability to detect and classify propaganda in other languages.23,24 While some scholars have called for a broader linguistic scope, 33 the current reality is that the most powerful detection tools are often linguistically and ideologically narrow.
Much of what we know about creating and testing detection algorithms on extremist content comes from a single source: Twitter (now X). Because the platform’s application programming interface was originally free and easy to use, this digital doorway, which allows researchers to collect data, became the primary training ground for these tools.21–23,30,31 However, this convenience led to a narrow research focus. Because Twitter is primarily text based, the field has neglected audiovisual content, like videos and images, which are increasingly used in modern propaganda. Recent changes to X’s data access policies have caused a decline in this research, 34 highlighting a deeper problem: a lack of cooperation from tech companies. Most researchers work independently, using outdated, public datasets because social media companies are often unwilling to share their real-time data. 35 This creates a lab versus reality gap. While algorithm effectiveness was claimed in every study we reviewed, it is impossible to know if that success would hold up on different platforms or against live, real-time propaganda. Ultimately, without a partnership between independent researchers and the platforms themselves, even the most effective tools may never be put into practice.
Tool 2: Content Moderation Algorithms
When we looked for research on CMAs—the tools that actually decide what to do with extremist content once detected—we found a surprising lack of evidence. While we initially identified 51 potential studies, only four met our standards for evaluating these tools. Our analysis showed that interest in this topic spiked in 2020. This was likely driven by the COVID-19 pandemic, as the world moved online for work and social interaction, making digital moderation a more urgent priority. However, the resulting body of research is still very small. Reviewing the four studies we analyzed revealed the following:
researcher disciplines: Unlike studies on detection tools, which were mostly facilitated by engineers, these moderation studies were largely led by experts in communication, media, and criminology.
ideological focus: In these studies, researchers did a better job of looking at both Islamist and right-wing extremism.
methods: Interestingly, one study involved moving beyond just analyzing algorithms; the researcher used interviews and documents to understand the human side of how moderation works. 36
The fact that so few studies exist on how algorithms handle extremist content moderation suggests that while propaganda is getting easier to detect, knowing how to manage it properly is still in the early stages.
What Do We Know?
The few studies included in this review showed that moderation tools are evolving to better limit the influence of extremist propaganda.36,37 For example, Borelli 36 reported a number of instances wherein social media corporations went beyond what the law requires for moderating extremist content. However, these efforts may be highly fluid, vary significantly by platform, and shift over time with companies’ priorities (e.g., due to political climate or new ownership). Overall, the research showed a tale of two digital worlds:
the successes: On platforms like Reddit and Gab, algorithms have prevented extremist content from being promoted to at-risk users, such as those who have interacted with far-right content. 38 Similarly, ISIS-related videos on YouTube have become increasingly rare. 32
the conflicts or failures: However, moderation tools often run head first into recommendation algorithms (RAs)–the systems designed to keep users engaged by showing them more of what they like. On YouTube, for instance, once users engage with far-right material, RAs make them twice as likely to be shown even more extreme content and 1.39 times more likely to be shown fringe content. 38
RAs create a filter bubble, where a platform’s own recommendation system acts as a megaphone for the very propaganda its moderation tools are trying to silence. While some recent studies suggest these bubbles might not be as inescapable for the average user as once feared,39,40 the tug of war between moderation and engagement remains a primary challenge.
Content moderation decisions are rarely purely technical; they are frequently driven by local politics and public pressure. For instance, it was only after the 2019 far-right terrorist attack in Christchurch that Facebook officially extended its hate speech policies to include white nationalism and separatism. 36 However, corporate public statements do not always match technical reality. A year before the Christchurch attack, Facebook representatives told U.S. lawmakers the platform was already addressing white nationalist content—a claim that later appeared inaccurate. This suggests that technology companies may prioritize avoiding political scrutiny or legal trouble over the actual performance of their moderation tools. Ultimately, social media platforms—and, by extension, their moderation tools—are caught in the crossfire of a larger debate over where to draw the line between public safety and personal expression. Public tragedies, for instance, often generate demands for stricter moderation of extremist content, while some political movements argue against moderation entirely, alleging that algorithms unfairly limit free speech and censor certain (often conservative) political and cultural viewpoints.
Issues & Limitations
While moderation algorithms are becoming more effective, they face three significant challenges that constrain their real-world effectiveness:
the problem of context: Training a computer to spot a specific banned symbol or word is easy. Teaching it subjectivity—the ability to tell if a post is actually supporting a terrorist group or just reporting on one—is much harder. Extremists often construct propaganda using subtle or hidden meanings that computers regularly miss. 36
political blind spots: Technology companies like Google and X often rely on official government lists, such as the U.S. State Department’s list of foreign terrorist organizations, to decide what to ban. However, these lists are inherently political. They often focus heavily on foreign threats while ignoring domestic extremist groups, leaving algorithms blind to home-grown violence. 36
extremist content migration: Even when an algorithm successfully removes extremist content from a major platform, the underlying problem may persist. Researchers have documented displacement, whereby extremists simply move their conversations to smaller, less-regulated platforms after being banned from places like Facebook or X. 37
In short, effective moderation on one site does not always translate into a meaningful reduction in extremism overall.
Beyond these technical hurdles, there is a deeper concern regarding how these tools are tested and who they truly serve. Much of the current research is based on small, isolated snapshots of data, yet there is a striking lack of replication; that is, testing these findings on larger and more diverse datasets to ensure results hold consistently across digital contexts.32,36–38 This lack of research makes it difficult to trust that these tools will work consistently in real-world scenarios. Furthermore, a fundamental shift is underway in how the internet is policed. In the past, moderation often relied on user communities to flag bad behavior. The move toward automated algorithms shifts the power away from users and into the hands of the corporations that own and operate these platforms. 36 In effect, the values of a private company, rather than those of a user community, become the law of the land. Despite slight improvements in corporate transparency, there are still no meaningful checks and balances or independent systems to ensure equitable use of these tools. Without such accountability mechanisms, the public has no reliable way of knowing why certain users are silenced while others are not, leaving these powerful automated systems largely unaccountable.
A Mechanism Scheme of Algorithmic Counterpropaganda
Our review revealed a clear pattern in how researchers and algorithmic designers currently address the problem of propaganda on digital communication platforms. Although researchers are spread across the globe and supported by a range of governmental and institutional actors, the scope of the work itself remains remarkably narrow. Almost all of these tools are developed by computer scientists and engineers, address only English-and Arabic-language content, and are designed primarily to detect Islamist extremism. Furthermore, while research on how content is actually moderated is increasing, the field is still in its infancy. Moving forward requires breaking out of these silos. The datasets used to train moderation algorithms must be expanded beyond their current limits to reflect the full diversity of global extremist threats. Taking stock of these gaps is the first step toward building a more comprehensive and balanced system.
We propose a new model, or mechanism scheme, that maps how algorithms can effectively counter propaganda (see Figure 1). When we talk about a mechanism, we are describing the specific gears and levers that produce a predictable result. Rather than assuming an algorithm will work as intended, this scheme provides a structural blueprint of the people, data, and processes required to produce a safer online environment. 41 By creating this abstract representation, we can better understand how each part of the system interacts 42 to interrupt the spread of extremist content.

Mechanism scheme for countering propaganda algorithmically
Our model is built on a basic understanding of how radicalization unfolds online. It starts with a believer, someone who accepts an extremist ideology as valid. Belief alone, however, is not enough to radicalize others through digital media; radicalization progresses only when that individual decides to leverage digital tools to influence others and turns that intent into action. At that point, the believer becomes a propagandist, actively creating and sharing extremist content. Exposure to such propaganda is well established as a significant factor in how others become radicalized, but this path alone is not sufficient. While we represent the complexity of the journey toward radicalization with a single arrow in our model, we also identify digital users’ exposure to propaganda as the critical turning point. By understanding the path from a single person’s belief to the moment a new user encounters their message, we can identify precisely where algorithmic intervention should occur to disrupt the chain and reduce the likelihood of radicalization. By extension, if propagandists come to believe that algorithmic moderation will substantially impair their ability to radicalize consumers, they may be dissuaded from producing and disseminating digital propaganda altogether.
To date, the predominant approach to combating digital propaganda is by simply limiting its visibility. Our research highlights a two-step process for this digital filtering. First, detection algorithms act as scouts, scanning millions of posts to find content that looks like propaganda. Once flagged, moderation algorithms are the decisional mechanisms and determine the appropriate response based on platform-specific rules, such as deleting, demoting, or shadow banning (i.e., undisclosed suppression of an account or its content from search results, feeds, or RAs). There is strong evidence that the first part of this system works; we found at least 29 studies confirming that these detection tools effectively identify extremist material, particularly content associated with Islamist extremism. In other words, these tools have become highly proficient at finding the needle in the haystack. The next challenge is determining what to do after finding it. Algorithmic application of moderation strategies, intended to have a direct effect on digital users’ exposure, is one option.
Ultimately, every algorithm is the product of human design. What designers understand about propaganda and their goals for addressing it shape their personal perspectives and influence how they design technology to behave. If a designer views extremism through a narrow lens, such as focusing only on Islamist threats, the resulting algorithm will be correspondingly limited. This can create dangerous ripple effects, including:
definitional problems: When propaganda is defined solely as an attempt to spread a single ideology, the resulting algorithmic code will be blind to other forms of extremism, including far-right and domestic threats.
logic problems: Similarly, algorithms constructed on a narrow conceptualization of propaganda will produce algorithmic protocols ill-suited to propaganda in all its forms.
detection training problems: Algorithms learn by practice. Even carefully designed moderation policies are useless if the detection tool fails to identify relevant content in the first place. Systems trained exclusively on one type of extremist material will become experts at spotting that content while failing to recognize anything else. A poorly designed detection system acts as a bottleneck, preventing enforcement of even the best policies.
moderation policy problems: Algorithms trained to apply moderation protocols that are limited in scope and adequacy will produce enforcement that reflects these limitations.
In short, an algorithm is only as smart as its creator’s programming. As such, algorithm designers are critical to countering propaganda. If their understanding of the world is flawed or their goals are constrained, the tools built will be similarly flawed or constrained. A platform’s logic is essentially a reflection of the goals and conceptual frameworks of the people who programmed it. Of course, what goes around comes around; designers’ understanding of propaganda is itself shaped by prior knowledge of propaganda dissemination, originating in ways similar to those depicted in our figure.
Our model identifies precisely where the current science falls short and where social scientists can provide the missing answers. Building more effective tools necessitates stepping inside the design room to understand more about what these creators know about propaganda, what their express goals are, how they define propaganda, and the logics that inform their content moderation policies. Looking at this problem through a new lens raises more ambitious questions about the future of digital safety. Could algorithms be designed to impede the creation or dissemination of propaganda in the first place, rather than responding to it after the fact? Could technology help counter the effects of propaganda after it is seen, or even support de-radicalization efforts? Answering these questions requires a new kind of partnership. Social scientists and technology developers must work side by side to design and test new technical features that may be deployed for these purposes. Only by combining human insight with technical power is it possible to move beyond reactive filtering and toward a genuinely effective algorithmic defense against extremism.
While not exhaustive, we offer the following recommendations to facilitate interdisciplinary collaboration between social scientists and engineers, with the aim of improving propaganda detection and disrupting the online-to-offline violence pathway across the ideological spectrum:
auditing and policy: In response to the designer understanding of propaganda and the designer goals concerning propaganda outlined in Figure 1, we recommend that social scientists lead independent audits of moderation outcomes, while engineers incorporate accountability-by-design principles into early-stage algorithmic development. Funding bodies should further require public–private transparency agreements as a prerequisite for research and algorithmic development support. This recommendation focuses on increasing transparency between public and private sectors regarding what content is considered extremist and why and how it gets moderated.
data access and ethical design: Engineers and the organizations they represent should establish transparent, privacy-preserving processes for providing external researchers with secure data access, while social scientists develop the ethical frameworks and behavioral benchmarks necessary to guide algorithmic refinement. This model requires funders to prioritize long-term, interdisciplinary grants that reward cross-sector validation over siloed technical speeds. This recommendation addresses and seeks to improve the designer-level definitions of propaganda and the training of algorithmic tools outlined in the mechanism scheme.
implementation and tool building: We recommend that social scientists create and share annotated datasets of extremist content to support engineers in training more context-aware detection and moderation algorithms. Funders can support this effort by establishing shared digital sandboxes in which engineers and social scientists can codesign and test moderation tools in real time. This recommendation focuses primarily on the algorithmic moderation of propaganda in our scheme to determine what works and to improve the workflow between these stakeholders.
Algorithms are often presented as the ultimate keys to unlocking breakthroughs in countering propaganda, and generally to solving most problems facing human society. 1 However, our review suggests taking a closer look at the people holding these keys. Progress requires moving beyond a narrow focus on one type of extremism and to start critically assessing the small group of designers who fashion these algorithms. These individuals hold the power to shape digital and lived realities and direct people toward one or more central tendencies, creating, in the process, the bounds of what is considered extreme. We should know considerably more about them, as well as those who own the code and can employ others to deploy it as they see fit. 43 The future of our digital reality should not be designed in the dark.
Supplemental Material
sj-docx-1-bsx-10.1177_23794607261459488 – Supplemental material for A mechanism scheme for improving extremist content detection & moderation algorithms
Supplemental material, sj-docx-1-bsx-10.1177_23794607261459488 for A mechanism scheme for improving extremist content detection & moderation algorithms by Jeremiah Perez-Torres and Kwan-Lamar Blount-Hill in Behavioral Science & Policy
Footnotes
Acknowledgements
Jeremiah Perez-Torres extends his sincere gratitude to his coauthor, Kwan-Lamar Blount-Hill, whose insights and collaborative spirit were indispensable in developing the mechanism scheme and refining the scope of this review. He is also deeply grateful to the anonymous reviewers for their rigorous feedback and constructive critiques; their perspectives significantly strengthened the clarity and impact of this manuscript. Any remaining errors or omissions are entirely his own.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
