Abstract
Anna’s Archive is a public index of illicit digital libraries that presents itself as a moral and technical infrastructure for global access to knowledge. Drawing on digital infrastructure ethnography, platform hermeneutics, and genealogical method, this article examines how the project negotiates legitimacy through metadata, discourse, and infrastructural design. I argue that Anna’s Archive transforms the logic of piracy into an infrastructure of visibility that mirrors and contests institutional archives, enacting a novel illicit politics of knowledge circulation. The study identifies four modes of legitimacy—pragmatic, moral, community, and cognitive—through which the archive sustains authority while remaining outside legal frameworks. By situating these practices within broader debates on platform accountability, data governance, and infrastructural justice, the article shows how shadow infrastructures expose the power relations and politics of access that underlie contemporary regimes of information extraction.
Introduction: The politics of the shadow archive
In the current landscape of digital knowledge, Anna’s Archive has emerged as a critical counter-institutional infrastructure. While often categorized as a pirate library, this classification fails to capture the project’s role as a structural response to the enclosures of the academic publishing industry. Rather than simply redistribute stolen content, it performs a specific kind of data craft that challenges the monopoly of formal institutions over global metadata and preservation. The project currently hosts over 150TB of data and maintains a 1.3-billion-record scrape of WorldCat (Anna’s Archive, 2023a). This scale suggests that the archive is not a temporary disruption but a permanent feature of the digital commons that enacts what is best described as an illicit politics of circulation.
The existence of such an archive reveals the coloniality of care inherent in institutional knowledge systems (El Aidi, 2025). Formal libraries and commercial publishers operate through a logic of exclusion, where access is a commodity and preservation is a privilege. Anna’s Archive operates outside these legal and economic norms to enact an alternative model of stewardship. It foregrounds what earlier systems have historically rendered invisible (Sefat, 2023). By centralizing access to disparate shadow libraries like Sci-Hub and Library Genesis (LibGen), the project creates a provisional epistemic order. This order allows researchers to navigate a fragmented digital landscape through a singular, reliable interface.
The central argument of this article is that Anna’s Archive maintains its stability through a continuous performance of legitimacy. Because the archive cannot rely on state protection or legal recognition, it must justify its existence through technical competence and moral positioning. This legitimacy is performed across three distinct registers defined by Suchman (1995): pragmatic utility, moral justification, and cognitive normalization. The archive proves its pragmatic value by delivering reliable access to millions of users. It establishes moral authority by framing its activities as a necessary crusade for the universal right to knowledge. Eventually, it achieves cognitive legitimacy as its use becomes a taken-for-granted routine within the academic community.
To dismiss these activities as mere acts of piracy belittles their political and social significance. However, rather than making an “end-run” around the legal question, this study engages directly with the history of intellectual property through the work of Adrian Johns and the “pirate philosophy” of Gary Hall to analyze how the archive’s specific technical and moral performance forces a political redefinition of knowledge.
This performance of legitimacy is currently facing a new and significant challenge. The rise of large language models (LLMs) and mass data extraction has transformed shadow archives into the primary substrate for AI training. Companies like Meta and DeepSeek (Lu et al., 2024) utilize these collections to build commercial products, creating a crisis of the commons. This extraction threatens to undermine the moral and community legitimacy that the archive has carefully constructed. If the archive exists to serve the public, its role as a data broker for corporate AI interests creates a fundamental tension. This article examines how Anna’s Archive navigates this crisis by leveraging its infrastructure as a site of political negotiation.
The following sections begin by establishing the methodological framework and researcher positionality necessary for navigating the ethical complexities of an illicit infrastructure. By prioritizing the methodological stance, this article seeks to make transparent the interpretive tools and reflexive distance required to study an extra-legal system before formally developing the theoretical registers of legitimacy through which the archive is subsequently analyzed.
Methodology and researcher positionality
This section outlines the specific methodological framework used to examine these tensions, including platform, discursive, and genealogical analysis.
First, I conducted an in-depth platform analysis, mapping the architecture and user experience of Anna’s Archive. This involved a close examination of its search interface, metadata structure, download options (including BitTorrent and InterPlanetary File System [IPFS] integration), and the “About,” “Blog,” and “Datasets” sections of its website. In this study, I use Gillespie’s (2010) idea of platform materialism as a methodological tool that lets me treat Anna’s Archive not as a neutral interface but as an arrangement of technical and organizational choices that shape participation and encode political commitments. This framing clarifies why I interpret the platform’s architecture as a site of infrastructural politics and not only an object of discourse, and it establishes the analytic pivot that the subsequent analysis of legitimacy develops. This analysis focused on how the platform’s design choices, what it foregrounds, what it hides, and how it guides user behavior, are active interventions in the politics of knowledge rather than neutral choices.
Second, I performed a close reading and discursive analysis of the archive’s public communications. This corpus includes all blog posts published by the Anna’s Archive team, as well as key interviews and public statements. This analysis draws on methods from critical discourse analysis, paying attention to how the archive “frames its mission, justifies its actions, and constructs its identity” (Van Leeuwen, 2007). I focused on the specific rhetorical strategies, moral claims, and political language used to build legitimacy with its users and the broader public. This discursive work is a component of the “symbolic work” (Geertz, 1973) that infrastructures must do to sustain themselves, especially oppositional infrastructures, that have to “articulate alternative normative orders” (Ewick and Silbey, 1998: 192).
Because Anna’s Archive operates as a resilient shadow infrastructure designed to evade legal takedowns, its documentation frequently migrates across mirror domains. For this reason, references to blog posts and documentation rely on archived copies or documented mirrors when available. This approach ensures that claims remain verifiable even when the live site changes or disappears.
Third, this article employs a genealogical method to trace the historical lineage of Anna’s Archive. This involves two key histories: the legacy of Soviet-era samizdat and the more recent history of post-Soviet digital piracy (e.g. LibGen). This approach seeks to understand how the moral and political logics of these earlier movements have been “adapted, repurposed, and re-imagined” in a contemporary digital context. By comparing the “analog infrastructures of samizdat” (Komaromi, 2022) with the “decentralized protocols of digital archives,” the analysis highlights continuities in the struggle for epistemic freedom. This historical lens helps to illuminate what is new about platforms like Anna’s Archive, and what is a remediation of long-standing tactics of resistance. While samizdat and shadow libraries both inhabit extra-legal spaces, they are categorically different: samizdat manages the politics of illicit circulation (the “what” of forbidden content), while Anna’s Archive enacts an illicit politics of circulation (the “how” of the infrastructure itself).
By employing these methods, I aim to understand the platform on its own terms, as a values-driven, politically significant project, and to analyze its implications using the critical tools of media and infrastructure studies. I situate this project within digital infrastructure ethnography and use platform hermeneutics to analyze interface behavior and infrastructural signals, critical discourse analysis to interpret the archive’s public communications, and a genealogical method to trace how illicit archives inherit and rework earlier oppositional infrastructures.
My analysis of Anna’s Archive requires an account of my position as a researcher studying an oppositional and illicit infrastructure from within institutional systems that the archive criticizes. I work with the archive’s public-facing code, metadata, mirrors, and discursive materials, but I do not participate in its maintenance practices or community forums. This interpretive distance shapes how I read the project, and it reminds me that I approach the platform through the vantage point of someone embedded in the academic knowledge economy that the archive challenges. A reflexive stance clarifies that my interpretive approach participates in the same field of infrastructural politics that the archive inhabits, even as it remains analytically distinct from the archive’s operational project. As a researcher, I must acknowledge that my reliance on the archive’s self-provided discourse potentially reproduces the project’s own myth-building. In this narrative, anonymity is framed as a strategic necessity for survival rather than a mechanism to avoid financial or ethical oversight.
I also recognize that the archive’s anonymity, legal precarity, and protective obfuscation limit what any researcher can know about its internal decision-making. I do not view these limits as obstacles; I treat them as part of the archive’s political design. Studying a system built to resist visibility requires inference instead of speculation, so I try to restrict my claims to what the archive makes public and to what its technical operations reveal through metadata structures and infrastructural signals.
Furthermore, the absolute anonymity of the Anna’s Archive team complicates the researcher’s ability to verify claims of “non-profit” status. This lack of transparency functions as a protective shield against legal repercussions, but it also creates a “black box” of accountability.
I approach the ethical stakes of this work by drawing a sharp distinction between analyzing an illicit infrastructure and endorsing or facilitating its practices. My goal is to understand how the archive constructs legitimacy, not to judge or promote its methods of distribution. I attempt to study its discursive claims and infrastructural arrangements without reproducing or enabling its illicit acts. My hope is this stance keeps the analysis focused on political form rather than operational detail.
This reflexive awareness of the archive’s illicit nature and my own institutional position provides the necessary context for analyzing how the project performs legitimacy across pragmatic, moral, and cognitive registers. Having established the analytical tools of platform hermeneutics and discursive inquiry, the following section develops the theoretical lens of “technological legitimacy” to dissect the specific strategies Anna’s Archive employs to maintain stability while remaining outside legal frameworks.
The politics of technological legitimacy
This section uses the politics of technological legitimacy to examine how Anna’s Archive positions itself within contested infrastructures of preservation and access. To understand how Anna’s Archive operates outside legal norms yet maintains stability, this section develops a framing of technological legitimacy. Illicit organizations, from pirate networks to darknet markets, develop their own governance practices, norms, and forms of justification (Shortland and Varese, 2016). The core of my argument treats legitimacy as a performance that works through discourse and infrastructure to justify the archive’s existence to its audiences. For a project like Anna’s Archive, which cannot call on state power or police enforcement, the work of establishing legitimacy never stops. Using Mark Suchman’s (1995) framework, which defines legitimacy as “a generalized perception or assumption that the actions of an entity are desirable, proper, or appropriate within some socially constructed system of norms” (p. 574), I distinguish between pragmatic, moral, and cognitive legitimacy, in an effort to provide a lens for dissecting the strategies employed by Anna’s Archive. While Suchman’s framework provides the analytical foundation for this study, the empirical case of Anna’s Archive reveals a fourth mode that his organizational sociology does not anticipate: community legitimacy, the internally constitutive process by which an organization converts its audience into a sustaining collective. This fourth mode is developed in the analysis that follows.
Pragmatic legitimacy depends on the immediate self-interest of an organization’s most proximate audiences. Suchman noted that pragmatic legitimacy remains precarious because it depends on continued usefulness; if the platform stops delivering, that support disappears. This exchange-based legitimacy works transactionally because the user offers attention and tacit support in return for access. Yet the platform strengthens this relationship by framing access as a restored right, which encourages the user to see themselves as part of a broader project rather than a lone beneficiary.
Moral legitimacy rests on a normative judgment about whether an organization acts in ways that align with shared values. It centers on whether an action counts as the right thing to do, a question that sits at the core of Anna’s Archive’s public narrative. This moral claim is essential for building a resilient community that will defend the platform even when it faces technical difficulties or legal threats. This goes beyond simple justification and enters the realm of what Van Leeuwen (2007) called “moralization,” where social practices are framed in terms of values and ethics. This process echoes what Ewick and Silbey (1998) described as the “subversive story” of legality, a mode of critique in which marginalized actors challenge legal authority by articulating alternative normative orders.
Cognitive legitimacy is a subtle and perhaps the most powerful form. Following Mark C. Suchman (1995), I treat cognitive legitimacy not as a claim about individual psychological states but as a property of institutionalized practice. In this sense, legitimacy emerges when an infrastructure becomes embedded in routine workflows to the point that its use no longer requires explicit justification. The relevant question is therefore not whether individual users consciously endorse the archive, but whether its presence becomes normalized within everyday scholarly activity. Crucially, referencing Suchman (1995), “cognitive” does not imply an individual psychological or neurological state, but rather the collective establishment of a norm. It arises when an organization’s existence and methods are naturalized and become part of the accepted landscape of “how things are done.” In organizational and Human Computer Interaction literature, this “taken-for-grantedness” describes routines established by groups of actors, rather than individual mental processes. For an illicit infrastructure to achieve this status against the backdrop of what Lamdan (2022) identified as “data cartels”—the companies that control and monopolize our information—requires an active renegotiation of established norms. The more the platform is used, the more it comes to be seen as a natural and indispensable part of the information ecosystem.
Suchman (1995) positioned cognitive legitimacy as the most durable of the three types precisely because it operates below the threshold of deliberate evaluation. Where pragmatic legitimacy depends on continued usefulness and moral legitimacy requires active normative agreement, cognitive legitimacy functions through what Berger and Luckmann (1966) called the “social construction of reality”—the process by which repeated social practices become sedimented into taken-for-granted facts about how the world works. In organizational sociology, this concept draws on DiMaggio and Powell’s (1983) account of institutional isomorphism, where organizations come to resemble one another not through coercion or conscious mimicry but because certain forms of practice become cognitively “the only way” to accomplish a task. Applied to information infrastructures, this means that cognitive legitimacy is achieved not when users approve of a platform, but when they stop thinking about it altogether—when it recedes into the background of practice as a natural feature of the research environment. Bowker and Star (1999) described this as the hallmark of mature infrastructure: it becomes visible only when it breaks down. For an illicit archive, reaching this state of infrastructural invisibility is a profound political achievement, since it means the system has successfully normalized a practice that remains formally prohibited.
To deepen this framework, this article also incorporates theories of technological legitimacy. This concept acknowledges that technologies themselves are not neutral but are imbued with values and can become actors in legitimation struggles. Data infrastructures, as Thylstrup (2022) argued, never operate neutrally. A technology can be seen as technologically legitimate if it is perceived as effective, reliable, and aligned with social values. Anna’s Archive leverages this by positioning its decentralized, open-source, user-supported infrastructure as inherently more legitimate than the centralized, proprietary, and profit-driven infrastructures of corporate publishers. The technologies themselves are rhetorical strategy: the use of BitTorrent and IPFS stress decentralization and user empowerment, which in turn bolsters the moral legitimacy of the entire project. This aligns with the concept of “critical infrastructure” (Ratto, 2011), where the system itself sustains alternative knowledge practices while exposing the logic of those who would exploit it.
The concept of platform governance helps to illuminate the ways in which Anna’s Archive constitutes itself as a political actor. As Plantin et al. (2018) argued, such infrastructures are not neutral utilities but “sites of contestation and political struggle” (p. 306). The archive’s technical decisions (what to collect, how to organize it, who gains access) are fundamental acts of governance. By shaping conduct through access (Foucault, 2007), the archive performs a kind of “infrastructural politics” (Plantin et al., 2018). As Sefat (2023) noted, this allows the system to foreground “not only what they store, but also what earlier systems have historically excluded or rendered invisible” (p. 142), effectively challenging the boundaries of established knowledge institutions.
This mode of governance expresses a particular vision for the future of the digital commons, constituting what Gray and Suri (2019) described as the “vernacular politics” of infrastructure (p. 165). By demonstrating an alternative model based on community values rather than corporate profit, the archive intervenes in the knowledge economy. It functions as part of a “counter-supply chain” (Caliskan et al., 2025: 8), a mechanism that actively reroutes value away from private accumulation and toward public preservation.
The dynamics of legitimacy described above hinge not only on narrative authority but on the infrastructural systems that make such authority actionable. If the previous section traced how legitimacy is performed through discourse and technological myth-making, what follows turns to the material architectures that sustain these performances. Understanding Anna’s Archive therefore requires shifting attention from questions of rhetoric to questions of infrastructure, accountability, and the political work of visibility itself. One of the primary ways this material performance manifests is through “data craft,” the technical labor of gathering and cleaning metadata to create a new, visible order of knowledge.
Data craft and the material performance of legitimacy
Anna’s Archive builds its pragmatic legitimacy through a significant display of technical expertise (Anna’s Archive, 2025a). This is most evident in the 1.3-billion-record WorldCat scrape (Anna’s Archive, 2023b). The project uses this metadata collection as what the archive calls a “TODO list” for the preservation of human culture. By scraping the most comprehensive bibliographic database in the world, the archive has created a map of absence. This process involves a significant investment in metadata labor, which Acker and Donovan (2019) argued is a political act of making things visible. The display of technical expertise serves as a material counter-weight to the logic of platform capitalism, where value is traditionally extracted by “black-boxing” proprietary metadata and access protocols to maintain market dominance.
This participatory mode of data craft functions as a counter epistemology to the opaque, proprietary regimes of commercial bibliographic monopolies like Online Computer Library Center (OCLC) (Acker, 2018). Whereas corporate systems treat the Golden Record as a commercial asset defined by deduplication algorithms that obscure internal logic (OCLC, 2025), Anna’s Archive deploys a transparency first approach to metadata merging. By opening its deduplication scripts to audit, the archive opposes the logic of the hash to the logic of the sale. It prioritizes the material preservation of every distinct file variant over the hygiene of the database (Anna’s Archive, 2024). This practice challenges the knowledge apartheid enforced by vendor supplied metadata, where the aboutness of a text is frequently determined by commercial categories rather than scholarly consensus (Oparinde et al., 2024). Such proprietary regimes impose a state of epistemic enclosure that subordinates the organization of knowledge to the imperatives of the market. Anna’s Archive resists this commodification of classification by demonstrating that the power to index is inseparable from the power to define what counts as knowledge. The archive’s refusal to black box its merging logic transforms metadata management into an act of epistemic disobedience (Mignolo, 2009). This reclaims the power to classify human knowledge from the black box of corporate platforms (Burrell, 2016). This reclamation mirrors the strategy of The Palantir Files where open metadata archives serve as evidentiary counter infrastructures that render the operations of secretive data firms accountable (Iliadis and Acker, 2024). By treating bibliographic records as forensic evidence of corporate enclosure, the archive seeks to transform technical schemas into tools for infrastructural inversions that expose the political economy of access.
These metadata practices add up to a reorientation of the archive’s temporal and political mission by generating a “TODO list” of unpreserved culture (Anna’s Archive, 2023a). By quantifying the gap between the total known bibliographic record and the files currently held in shadow preservation, Anna’s Archive transmutes metadata into a map of absence (Anna’s Archive, 2023a). More than list available downloads, the counter catalog performs a field level reordering that dictates how rarity and obscurity are produced or corrected. It directs collective labor toward works that are rare and those that are uniquely underfocused or most vulnerable to erasure (Anna’s Archive, 2024). Following Acker (2018), this labor is an act of data craft, a communicative infrastructure of preservation that determines the “aboutness” of a text. By standardizing and merging these records, the archive performs the societal work of determining the criteria of knowledge, effectively deciding what becomes visible and retrievable for the global community. In this sense, the societal work of the archive’s metadata is to convert the abstract vastness of global publishing into a concrete infrastructural mandate. This ensures that the criteria of knowledge is defined not by copyright status, but by the material imperative to prevent the extinction of the historical record.
Anna’s Archive builds durability on a decentralized material system that keeps the project outside any single legal or geographic point of failure. Research on decentralized preservation shows how shadow libraries distribute storage and redundancy across networks that avoid central hosts (Kjellström, 2025). Anna’s Archive works within this logic by shifting bandwidth and storage through BitTorrent and IPFS rather than a single server. These choices reduce the impact of domain seizures because the system continues to function even as gateways fall. They also reject the concentrated control that defines commercial platforms. This design depends on an economic structure that supports ongoing labor and maintenance. The archive keeps access open to advance a right to knowledge ethic, but also uses a paid tier to fund hosting costs, indexing work, and the replication tasks that keep mirrors stable. The system resembles freemium models. It redirects revenue toward infrastructural durability rather than profit, at least from data that is visible and from the rhetoric that is used. This reframing turns payment from a service based transaction into a contribution that aligns the project’s economic structure with its political commitments.
The technical architecture of decentralized networks functions as a site of political negotiation, where routing protocols actively dictate the limits of institutional control. Studies of IPFS show how routing and storage practices weaken central authority and frustrate enforcement efforts (Herman et al., 2026; Santoso et al., 2025). Gateways struggle to filter denylisted content because the protocol distributes responsibility across many nodes. Anna’s Archive uses this behavior to route around blockages and maintain access. These material decisions shape the project’s political stance and its capacity to withstand intervention. This decentralized architecture is not a self-sustaining machine. Rather, its political power is contingent upon a sociotechnical layer of labor that transforms technical participation into institutional resistance. Salamon and Saunders (2024) showed how routine acts of maintenance and production can challenge platform domination. Their insight clarifies the role of seeding torrents, refreshing mirrors, and repairing corrupted metadata. These practices confront the forms of control that publishers and data brokers try to impose. In this context, maintenance operates as resistance because it keeps an oppositional archive active under sustained pressure.
While Anna’s Archive inherits the dissident spirit of samizdat and the disruptive tactics of early shadow libraries, its approach to navigating the contested digital commons engages in a novel conception of legitimacy. By both positioning itself as a continuation and a critique of its predecessors, the archive adopts what Balan (2024) identified in the discursive construction of protest technologies, where digital platforms challenge existing paradigms while becoming sites of legitimacy struggle. This performance does not seek legal sanction, an impossibility for a project of its nature. Instead, it is directed at multiple audiences: its global user base, the wider open access community, institutional adversaries, and commercial publishers. Through this communication, its goal is to build a resilient social and technical foundation for its continued existence. This performance occurs through the three lenses of legitimacy: pragmatic, moral, and cognitive. It also introduces a fourth dimension: the construction of community legitimacy through inward oriented communication.
Oppositional archives rely first on usefulness. If they fail, they lose their reason to exist. Anna’s Archive builds pragmatic legitimacy by demonstrating infrastructural competence, which Salamon and Saunders (2024) called the arts of digital resistance. Routine maintenance and metadata labor become political acts that construct counter infrastructures against extractive, corporate platforms. Anna’s Archive presents itself as a meta library that aggregates shadow libraries into a unified index. Its landing page claims it will preserve all the knowledge and culture of humanity, a statement tied to concrete aggregation practices that pull together collections from LibGen, Z-Library, and Sci-Hub. By scraping and merging these catalogs, the project reduces the friction users once faced when navigating multiple unstable sources. A single search interface delivers immediate utility, which anchors its pragmatic legitimacy. The archive highlights infrastructural skill through specific preservation projects. In one blog post, the organizers describe scraping and releasing WorldCat holdings data to build a “TODO list” of rare books to archive. Volunteers bypassed anti scraping protections, harvested records, and deduplicated 170 million entries (Anna’s Archive, 2023a). This display of technical expertise is framed as essential for preserving at risk knowledge. This emphasis on capacity echoes broader conversations about metadata politics and infrastructural accountability (Acker, 2018; Iliadis and Acker, 2024).
User experience reinforces this legitimacy. The interface avoids deceptive ads and heavy clutter common to other illicit sites. Its minimalist design mirrors institutional archives and signals professionalism. The option for paid users to access faster downloads introduces a tiered structure found on legal platforms while keeping core access free. For Anna’s Archive, pragmatic legitimacy represents its simplest mode of justification and emerges clearly through the platform’s ability to deliver dependable, immediate access to materials. Every successful download and every located article reinforces its utility. Pragmatic legitimacy also depends on resilience. Anna’s Archive treats domain seizures, legal pressure (e.g. Brown, 2026), and technical attacks as ongoing threats. It responds with distributed infrastructure. The organizers encourage users to seed torrents, run IPFS nodes, and access the site through Tor. These decentralized strategies strengthen the archive’s durability and transform access into a collective project. As Bodó (2018a) argued, resilience becomes a form of infrastructural politics where a system survives because its users help maintain it.
Pragmatic utility cannot sustain an oppositional archive on its own. Anna’s Archive strengthens its position through moral legitimacy, which it constructs through a steady discursive effort on its public-facing blog. These posts function as editorials that articulate why illicit circulation should count as a public good. One strand grounds legitimacy in access as a right. The organizers reject terms such as piracy or theft and frame their work as a response to structural inequality. They condemn a knowledge economy that restricts scholarship behind paywalls for corporate profit and argue that we cannot trust humanity’s heritage to corporations. They present shadow libraries as necessary because they can do things that other institutions are not allowed to do. This aligns with Sefat’s (2023) argument that infrastructures like Anna’s Archive foreground what official systems exclude or render invisible. A second strand builds legitimacy through transparency. Blog posts explain goals, methods, and finances, which operate as trust building devices. This strategy resonates with Balan’s (2024) account of protest infrastructures, where communicative clarity and visibility support legitimacy. By adopting archival metaphors and institutional language, Anna’s Archive positions itself within long running debates about cultural stewardship and expertise (Cristofari, 2024).
A third strand addresses authorship and labor. The organizers criticize publishers rather than writers and frame the current system as exploitative. They describe large academic and trade publishers as rent seeking (Van der Sar, 2022) and cast the archive as a custodian that preserves cultural memory against privatization (Bodó, 2018a). This framing lets them claim solidarity with authors while defining the publishing industry as the primary antagonist. The platform’s organizers pursue discursive legitimation by crafting a narrative that rejects the language of theft and casts their work as a moral crusade. This aligns with recent scholarship on the performative dimension of protest infrastructure (Balan, 2024). Site admins position the archive as a corrective to the injustices of a knowledge economy that commodifies and restricts access to information. Their blog posts and public statements appeal to a higher set of values: the universal right to knowledge, the importance of cultural preservation, and the critique of corporate greed. In doing so, they seek to build a constituency that supports the archive for its principles as well as its utility. Across these strands, Anna’s Archive turns moral argumentation into infrastructural justification. Blog posts work as moralization where practices become legitimate because they respond to injustice, not because they follow legal norms. This moral legitimacy anchors the project’s claim that illicit preservation counts as an ethical obligation when official institutions limit access.
A fourth dimension of Anna’s Archive’s legitimacy work cannot be adequately captured by Suchman’s tripartite framework. Suchman’s three modes—pragmatic, moral, and cognitive—are all, in their different ways, externally oriented: they address audiences outside the organization, justifying its existence to users, publics, and institutional observers. What they do not account for is the internally constitutive work of legitimation: the process by which an organization transforms its audience from passive beneficiaries into active stakeholders who sustain the project from within. This article terms this fourth mode community legitimacy, drawing on social movement theory’s account of collective identity formation (Cammaerts, 2015) and Ewick and Silbey’s (1998) concept of the “subversive story”—the internal narrative through which marginalized actors articulate a shared normative order that justifies their resistance to legal authority.
Community legitimacy is not simply a subset of moral legitimacy directed inward. Where moral legitimacy asks whether outsiders judge the organization’s actions as right, community legitimacy asks whether participants understand themselves as members of a collective project with shared stakes in its survival. For an illicit infrastructure like Anna’s Archive, this distinction is structurally significant: the archive cannot rely on state protection and is therefore dependent on its user base not just for access traffic, but for the distributed labor—seeding torrents, running nodes, monitoring mirrors—that keeps the system alive. Building community legitimacy is therefore not a supplementary communicative strategy but an operational necessity. As Bodó (2018a) argued in his account of guerrilla open access movements, the resilience of illicit archives is inseparable from the structural solidarity of the communities they cultivate. Anna’s Archive pursues this cultivation through three specific communicative practices.
First is the creation of a shared narrative of struggle and resilience. The blog posts frequently update the community on the various threats the archive faces. By sharing these struggles, the organizers strive to foster a sense of shared adversity and collective identity. The community is invited to see itself as being in the trenches alongside the archivists. This narrative transforms the act of using the archive from a simple download into an act of defiance and solidarity. This narrative of shared struggle is evident in the archive’s regular communications. For instance, in a post titled “An update from the team,” the volunteers acknowledge facing “increased attacks on our mission” while reassuring the community that they are still alive (Anna’s Archive, 2025b). The post simultaneously warns of the disappearance of a partner LibGen fork and cautions against a new, untrusted mirror site called WeLib. This communication strategy serves to reinforce the precariousness of their work and the importance of community solidarity. It frames the act of using the archive as participation in an ongoing, high stakes conflict over information access. Cammaerts (2015) described this as the constitutive function of movement communication: by naming threats, identifying adversaries, and framing participation as consequential, an organization generates the collective identity that sustains mobilization beyond any individual instrumental motive. The blog post does not merely inform; it interpellates readers as movement participants.
Second is the practice of enlisting the community in the project of preservation. As mentioned under pragmatic legitimacy, the calls to seed torrents and run IPFS nodes are technical strategies for resilience and communicative acts. These calls to action are framed by Anna’s Archive as empowering users and give them a tangible stake in the project’s survival. The act of enlisting users in the core functions of the archive frames archiving as a form of collective maintenance. This hopes to foster a sense of ownership and responsibility in the user base. The platform is no longer something they run; it is something we maintain. However, by creating a seamless, user-friendly interface that mimics and surpasses the functionality of legitimate digital libraries, Anna’s Archive works to normalize its own existence. Yet, this normalization does not rely on a single, monolithic cognitive frame. Uploading pirated works to an archive, downloading a text, and maintaining the infrastructure are distinct forms of use with varying justifications and norms. These distinct roles involve different relationships to the archive’s infrastructure. Maintainers sustain the technical system, uploaders expand the corpus, and readers access texts for research or teaching. Cognitive legitimacy therefore emerges not from a single shared motivation but from the convergence of heterogeneous practices that gradually stabilize the archive’s presence within scholarly workflows. The reasons a scholar from the Global South accesses the archive to circumvent exclusionary data cartels (Lamdan, 2022) differ fundamentally from an undergraduate at a community college seeking a textbook, or a federally funded scientist bypassing a paywall. By accommodating these divergent moral and pragmatic ends within a single reliable interface, the act of checking Anna’s Archive becomes a collectively established routine. This gradual embedding of the platform into the diverse everyday practices of its users is a key mechanism for building cognitive legitimacy. 1 This practice operationalizes the “subversive story” (Ewick and Silbey, 1998); users are invited not merely to consume an alternative normative order, but to physically sustain it through their material labor. The act of seeding a torrent becomes a small but concrete enactment of the claim that knowledge belongs to everyone.
Third is the fostering of a shared ethos and value system. The blog posts attempt to articulate a coherent political and ethical philosophy. This has an educational function within the community about the economics of academic publishing, the history of the open access movement, and the philosophical arguments for the free circulation of knowledge. This educational function transforms a user base into a social movement actively participating in a social process through which collective actors articulate their interests and voice grievances (Cammaerts, 2015). The ultimate goal of any legitimating performance is to achieve cognitive legitimacy: to become so deeply embedded in the practices and assumptions of a community that its existence is taken for granted. For an illicit organization like Anna’s Archive, this is the most difficult and most important task. The primary mechanism for this is the platform’s design and user experience. The minimalist interface creates a sense of normalcy and stability. There is no hint of the subcultural, transgressive aesthetic that often characterizes hacker or pirate communities. The experience is designed to be utterly banal. This very routinization is a powerful legitimating force. In this sense, community legitimacy functions as the precondition for the archive’s other legitimacy claims: without a constituency committed to internal mobilization on behalf of the project, neither pragmatic reliability nor cognitive normalization could be sustained against the legal and technical pressures the archive continuously faces.
A second mechanism is the strategic use of metadata. By creating a searchable catalog, Anna’s Archive performs the institutional functions of a traditional library (see Figure 1). The attention to metadata is a form of data craft that signals a long-term commitment to stewardship and preservation (Liang, 2012). This work is not just technical; it is symbolic. It parallels what Mayernik (2019) described as “metadata accounts,” where the rigorous production and classification of data achieves evidence and stability for scientific research. By providing these transparent metadata accounts, Anna’s Archive asserts its own cognitive legitimacy alongside, and in opposition to, proprietary platforms of enclosure (Mirowski, 2023). It invokes the “pirate function” (Philip, 2005) not as an intellectual property violation, but as a mechanism to reclaim the technological authorship of the public collection. In doing so, it constructs an alternative epistemic order grounded in the ethical value of sharing, the technical logic of indexing, and the historical memory of access denial (Bodó, 2018a). Recent work in media and communication studies positions metadata as a form of epistemic infrastructure that makes knowledge both possible and governable. As Acker (2018) demonstrated, metadata is not simply a descriptive layer but an active site of knowledge production that determines what becomes visible, credible, and retrievable. Her concept of data craft highlights how the manipulation and maintenance of metadata constitute epistemic labor: the invisible work of crafting legitimacy through classification and persistence.

Screenshot of Anna’s Archive showing the collection of libraries it maintains. (annas-archive.org)
Shin et al. (2025) extend this view by showing how contemporary fact checking systems operate as epistemic infrastructures in their own right. They legitimate information through routines of labeling, tagging, and verification, producing what the authors call infrastructural truth. Read alongside Acker, their work suggests that metadata practices are never neutral acts of description but forms of epistemic governance consisting of mechanisms through which credibility is stabilized and contested. These practices do not invent new bibliographic standards. Rather, they adopt long-standing cataloging conventions developed within library and information science, including normalized author names and structured authority records. Anna’s Archive performs a similar function: by aggregating and reconciling bibliographic metadata drawn from libraries, publishers, and shadow repositories, it transforms metadata into a mode of public truth work that both imitates and critiques institutional regimes of knowledge validation. It also makes collections that were previously fragmented searchable through a single interface. These practices rely heavily on metadata conventions developed within library cataloging and bibliographic information systems, which the archive repurposes rather than invents. Metadata becomes a political tool that anchors the system’s material foundation in practices that support credibility rather than extraction.
Kelly (2025) deepens this conversation by describing situated epistemic infrastructures that maintain coherence within fragmented knowledge ecologies. His framework clarifies how infrastructures like Anna’s Archive do not merely store or classify information but generate provisional epistemic order in environments of uncertainty. When read together, Acker’s data craft, Shin et al.’s infrastructural truth, and Kelly’s situated epistemic infrastructures converge on a shared argument: metadata is a political technology of knowing. Anna’s Archive exemplifies this convergence by performing epistemic labor that is both infrastructural and insurgent, producing cognitive legitimacy through the very classifications that organize and contest the politics of knowledge.
Finally, the archive achieves cognitive legitimacy through its scale and reliability. As the platform becomes the go-to source for an ever larger number of users, empirical evidence from analogous platforms demonstrates it increasingly functions not as an illicit alternative but as integrated infrastructure (see Figure 2). Greshake’s (2017) analysis of 28 million Sci-Hub downloads reveals usage patterns consistent with routine research practice, with 35% of downloads being recently published articles (post-2013), suggesting integration into current scholarly work rather than archival retrieval. Herman et al. (2023) document that shadow libraries have achieved what they term “infrastructural normalization,” becoming embedded in researchers’ everyday workflows to the point that, in some contexts, their illegality is rarely mentioned. This process of normalization is perhaps the most significant threat to the existing regime of intellectual property. The cultural significance of Anna’s Archive rests not only in the content it circulates but in the future it both imagines and helps to create. Through the mundane, repeated, and successful act of finding and downloading a text, Anna’s Archive is slowly but surely winning the argument through the force of everyday practice. This epistemic framing also foreshadows the archive’s entanglement with artificial intelligence (AI), where metadata itself becomes the substrate of new legitimacy struggles over training data, ownership, and accountability.

Screenshot of Anna’s Archive search interface displaying metadata fields and download options.
The empirical measurement of cognitive legitimacy for Anna’s Archive specifically is complicated by the platform’s protective anonymity and users’ reluctance to document illicit activity. However, a substantial body of research on analogous shadow libraries provides robust evidence of normalization. Karaganis’ (2018) edited volume presents country-level empirical studies documenting how shadow libraries function as routine infrastructure across Argentina, Brazil, India, Poland, Russia, and South Africa. Quantitative analyses of download patterns reveal global usage that includes not only developing-world academics but also researchers from the United States, Germany, and the United Kingdom (Bodó et al., 2020; Gardner et al., 2017). Critically, this usage extends beyond contexts of absolute necessity: Bodó et al. (2020) found that richer regions are the most intensive users, and Andročec (2017) documented that researchers download open-access papers from Sci-Hub for convenience, preferring a single platform regardless of paywall status. Qualitative research provides direct accounts of routinization: Kjellström’s (2019) interviews with Swedish PhD students and librarians revealed overwhelmingly positive perceptions and descriptions of shadow libraries as necessary responses to publisher hegemony. Most compellingly, Herman et al. (2023) interview-based research documents that Sci-Hub use in France has achieved what Merton (1968) called “obliteration by incorporation”—it functions as a common source like Google Scholar, with its illegality rarely mentioned. These studies collectively demonstrate that shadow libraries have achieved what Bowker and Star (1999) described as infrastructural invisibility: they have become sufficiently normalized that their use escapes explicit justification. The difficulty of obtaining direct testimony about Anna’s Archive usage, combined with its sustained growth and integration into discussions of scholarly infrastructure, suggests it follows this same trajectory toward cognitive legitimacy.
Data-infrastructure politics and platform accountability
Having established how legitimacy is performed through material infrastructure, it is now possible to examine how these technical arrangements function as mechanisms of platform accountability. Anna’s Archive operates not only as a moral argument for openness but as what Iliadis and Acker (2024) described as a “counter-infrastructure,” a system that exposes the opacity of corporate data regimes by performing radical transparency. Their analysis of Palantir’s clandestine data infrastructures reveals how accountability is deliberately obfuscated; indeed, they argue that “opacity is not simply a side effect of complex software; it is a technical achievement” (p. 2).
This politics of visibility must be situated within a wider field of infrastructural inequality. As El Aidi (2025) argued in her critique of “technology for good” initiatives, infrastructures that claim humanitarian or ethical motives often reinscribe colonial hierarchies of care and control. Similarly, Vučković and Bilić (2025) showed that digital infrastructures redistribute power unevenly, producing what they call “infrastructural spillovers” that privilege certain publics while marginalizing others. Reading Anna’s Archive through these lenses clarifies that its open architecture is not merely an ethical preference, but a material intervention in the geopolitics of knowledge. By decentralizing its storage and mirroring logic, the archive redistributes infrastructural capacity across borders, countering what El Aidi calls the “coloniality of care” (p. 4) with a participatory model of collective maintenance.
Whereas mainstream infrastructures centralize data and responsibility, emerging decentralized models imagine accountability as a protocolic function. De Filippi (2024) conceptualized accountability as a technical affordance embedded in code and protocol design rather than an external legal obligation. Anna’s Archive attempts to utilize this shift through redundancy scripts, torrent infrastructure, and metadata versioning that builds in procedural accountability. Calzada’s (2025) analysis of Web3 governance underscores how such decentralized architectures experiment with new forms of “infrastructural citizenship” (p. 7), where transparency replaces regulation as the moral basis of governance. Toward a similar end, Anna’s Archive performs accountability through its infrastructure, which allows them the rhetorical claim of transforming openness itself into a normative practice.
Yet, as Ferrari and Graham (2021) remind us, infrastructures are never total systems of control; they contain fissures, points of instability, contestation, and potential resistance. Anna’s Archive exploits these fissures in the global data regime, leveraging the cracks in copyright enforcement, hosting jurisdictions, and algorithmic indexing to sustain a public commons. Through these acts, it makes infrastructural power both visible and vulnerable. In this sense, the archive’s transparency is not based on utopianism, but a strategy of infrastructural subversion: an attempt to demonstrate that accountability necessitates visibility and collective stewardship. These material and economic substrates represent the foundation upon which the archive’s political interventions are staged.
Piracy and the illicit archive
The stability of Anna’s Archive is constantly challenged by its classification as an illicit operation. This makes the critique that it is “mere piracy” the primary discursive threat to its legitimacy. However, as Adrian Johns argued in Piracy: The Intellectual Property Wars from Gutenberg to Gates (2009), “piracy” has never been a simple or technical violation of the law. Instead, it is a historical force that has consistently exposed and negotiated the fragile boundaries defining legitimate media and authorship across every major media transition since the printing press.
Anna’s Archive inherits this history not as a repetition of theft but as a rearticulation of how legitimacy is produced through circulation. By engaging Johns’ work, we move past the simplistic moral judgment of the term “piracy” to analyze how the archive’s specific technical and moral performance forces a political redefinition of knowledge in the digital age. This aligns with Gary Hall’s reframing of piracy as a philosophical method (2016a). Hall argues that piracy acts to unsettle conventional ideas of proprietorial authorship by enacting new modes of thought, creation, and publication (Hall, 2016a: xiv). Therefore, the project’s refusal of the “piracy” label is itself a central act of epistemic intervention testing the limits of institutional knowledge systems.
This intervention operates by exploiting the “constitutive” nature of piracy described by Johns. The designation of “pirate” has historically functioned less as a description of a criminal act and more as a mechanism for market stabilization. In his analysis of the Stationers’ Company in 17th-century London, Johns demonstrates that the term was weaponized to suppress independent printers whom Johns calls interlopers, traders who threatened the guild’s monopoly. These “Land-Pirats” (Johns, 2009: 41) were not merely stealing property but contesting the guild’s control over the civility of the printed word. By labeling unauthorized reproduction as piracy, the Stationers successfully conflated their private commercial interests with the public order. Anna’s Archive operates within a similar dynamic today as it challenges the oligopolistic control of major academic publishers who conflate their paywalls with the integrity of science itself.
Furthermore, the history of intellectual property reveals that legitimacy is often a temporal rather than moral distinction. Johns details how the United States spent its first century as what Johns (2009: 179) calls a pirate nation, systematically reprinting British literature without remuneration to foster its own literacy. It was only after American publishers had accumulated sufficient capital as well as content that the nation pivoted to become a staunch defender of international copyright. This historical trajectory suggests that the illicit status of Anna’s Archive is not an ontological defect but a phase in the redistribution of epistemic power. By mirroring what Johns (2009: 179) calls the “pirate revolution” of the nineteenth century, the archive functions as a critical infrastructure for the Global South. It creates the access necessary for a global knowledge economy that the formal market fails to serve.
In the context of what Hall (2016b) described as the “Uberfication of the university,” where knowledge is increasingly enclosed by platform capitalism, the pirate archive assumes a new role as the preserver. Hall argues that we must move beyond liberal humanist critiques of copyright to adopt a “pirate philosophy” that actively builds the infrastructure for a “digital posthumanities.” Anna’s Archive exemplifies this praxis by mirroring shadow libraries like Sci-Hub and ensuring the survival of the scholarly record against legal enclosure. When Anna’s Archive scrapes metadata or backs up Spotify libraries, it is not engaging in simple theft but in a radical act of care for the cultural commons. It exposes the fragility of legitimate platforms where authorized access is precarious. The piracy of Anna’s Archive is therefore an epistemic struggle forcing the legitimate system to confront its own exclusions. Just as the Land-Pirats of the 17th century forced the articulation of statutory copyright, the shadow librarians of the 21st century are forcing a renegotiation of who has the right to steward the sum of human knowledge.
A genealogy of oppositional archives: From samizdat to shadow libraries
The historical lineage of such oppositional infrastructures can be traced through the intertwined histories of piracy and shadow librarianship, where questions of access and legitimacy first converged in material form. The strategies of legitimation are deeply rooted in a long history of oppositional media practices and infrastructures designed to circumvent state and corporate control over information. By tracing this lineage, we can focus on two key precursors: the clandestine textual cultures of Soviet samizdat and the anarchic digital commons of early post-Soviet shadow libraries.
Samizdat and its stakes
Samizdat (“self-published”) refers to non-state writing, copying, and circulation in the Soviet Union and Eastern Bloc from the 1950s onward. It countered the state’s monopoly over presses and publishing houses and built a cultural sphere that operated on its own terms. Komaromi (2022) argued that samizdat did more than move forbidden texts: it sustained practices of publicness, authorship, and meaning-making that enabled readers and writers to imagine alternatives to official discourse (pp. 19, 82). Illegality signaled authenticity and helped constitute a moral order distinct from state authority. Samizdat functioned primarily as the politics of illicit knowledge circulation. Because the texts themselves were forbidden by the state, the act of circulating them was a sovereign ethical act that created authority outside state structures.
The practice remained material and risky. It relied on typewriters, carbon paper, and physical hand-to-hand circulation. Yurchak (2006) reads the labor of retyping as part of its political meaning, since each reproduction signaled solidarity and shared risk. Kind-Kovács and Labov (2017) emphasized how the texture of the copied page: its faint ink, errors, and degradation, shaped reading and conveyed political weight. Wciślik (2021) similarly noted how time-intensive reproduction produced social value by marking participation in clandestine networks. Samizdat circulated censored political essays, religious writing, and authors who later became canonical (Komaromi, 2022; Toker, 2019).
Legitimacy in this system operated primarily through moral force. Samizdat offered little pragmatic legitimacy, since copying was slow and dangerous, and its cognitive legitimacy remained limited to small interpretive communities. Bourg (2017) described participation as a “sovereign ethical act” that created authority outside state structures. Sefat (2023) called this a politics of absence: making suppressed knowledge legible through material reproduction. These dynamics create the conceptual bridge to digital shadow libraries. As Bodó (2018b) noted, illicit archives inherit a similar moral logic: in contexts of unequal access, circumventing legal regimes becomes a preservation strategy rather than a violation. Digital shadow libraries extend this lineage by shifting from material copying to computational reproduction while retaining the claim that exclusionary systems justify oppositional circulation.
From analog to digital: Early shadow libraries
The movement from the analog infrastructures of samizdat to the digital architectures of early shadow libraries signals a structural transformation in the illicit politics of knowledge circulation. What had once depended on personal copying and embodied trust became encoded in scripts and distributed networks. The early digital libraries of the post-Soviet internet, projects such as LibGen and the nascent Sci-Hub, translated moral resistance into infrastructural practice, treating access not as protest but as a logistical problem to be solved. This shift coincided with a new economic landscape: the collapse of public institutions and the rise of commercial publishing barriers created new forms of scarcity. Within this vacuum, digital archives inherited the ethical imperatives of samizdat while adopting the scalability and anonymity of networked systems, producing an oppositional infrastructure grounded in technical coherence rather than individual defiance.
The first generation of digital shadow libraries emerged from this environment of scarcity and experimentation. As Bodó (2018b: 25) observed, the post-Soviet information landscape fostered a pragmatic ethos that treated the reproduction of knowledge as both an ethical and technical challenge. Volunteer communities began to digitize academic texts, store them on personal servers, and share them through peer-to-peer systems that mirrored the informal exchange networks of late socialism. Komaromi (2022) noted that these projects carried forward the collectivist sensibility of samizdat but reframed it within the logics of computing and access. What had once been the circulation of typewritten pages became the maintenance of databases and protocols. This transformation linked moral justification to infrastructural labor, establishing the foundation for a politics of preservation that would later define LibGen and Sci-Hub.
The collapse of the Soviet Union created not only a political vacuum but also a deep informational one. In the disordered 1990s, the emerging Russian internet, or RuNet, became an experimental field for rebuilding access to knowledge. The end of state censorship coincided with the rise of market-driven scarcity, as Western publishers imposed prices far beyond the reach of most post-Soviet universities. The disintegration of public library systems left scholars and students without basic materials. In this setting, digital archives evolved from improvised stopgaps into durable infrastructures. They combined the collectivist ethics of samizdat with the pragmatic needs of post-Soviet science, forming a networked response to the new conditions of exclusion that accompanied market transition.
It was in this context that the first digital shadow libraries emerged, transposing the dissident ethos of samizdat into the digital realm. Early projects, often run by anonymous individuals, began by digitizing vast libraries of scientific and technical texts, filling the gap left by collapsed national institutions. These initiatives were driven by a pragmatic need to sustain scientific research. Thus, their legitimacy was initially pragmatic: they provided essential resources for a community in crisis. This ecosystem thrived in part because the low level of enforcement and the widespread culture of copying in Russia allowed these projects to grow with minimal friction (Bodó, 2018b; Komaromi, 2022) Preexisting digital communities and a “permissive environment” on the RuNet laid the groundwork for these distributed infrastructures (Gorny, 2009).
However, they also carried forward the moral legacy of samizdat. As one of the founders of LibGen stated, the project was animated by a belief that “knowledge should be accessible to everyone” (Bodo, 2016). 2 Platforms like LibGen and the later, more focused Sci-Hub, represent a crucial evolutionary step, leveraging the internet’s scalability, anonymity, and global distribution. Sci-Hub, in particular, has been instrumental in its use of discursive legitimation. Its founder, Alexandra Elbakyan, has consistently articulated a moral argument against academic publishers, framing Sci-Hub as an act of civil disobedience in the service of global science (Elbakyan and Bozkurt, 2021).
While shadow library projects often invoke a universalist ethics of openness, this framing has drawn critique from scholars who argue that openness is never neutral. Christen and Anderson (2019) showed that universal access models can reproduce extractive relationships when they disregard community protocols of consent or stewardship. Caswell and Cifor (2022) likewise argued for a relational ethics of care that situates access within specific cultural and political contexts. Reading Anna’s Archive through these perspectives complicates its emancipatory claims and places its politics of access within broader debates about accountability and cultural sovereignty.
It is also useful to trace the evolution of samizdat itself. Ann Komaromi (2022) distinguished literary samizdat from its later, more explicitly civic role, noting that it evolved “from a predominant concern with poetry and fiction towards an ever greater emphasis on journalistic and documentary writing” (p. 71). This transformation shows how the use of samizdat shifted from cultural preservation to political contestation. Skilling (1989) argued that these developments placed samizdat within broader currents of civil society under late communism, revealing its latent potential as a civic infrastructure.
It is important to note key differences between samizdat and shadow libraries. Samizdat relied on personal practices and interpersonal trust, with texts passed hand to hand. Shadow libraries, by contrast, scale distribution beyond personal networks through file downloads and torrent protocols. Where samizdat foregrounded authorial risk, shadow libraries anonymize and collectivize labor, embedding it in scripts, scrapers, and catalog schemas. These differences mark an epistemological evolution in which samizdat started out as weapons of resistance and mutated into shadow libraries that frame access as a public good within an extractive knowledge economy.
To understand the emergence of Anna’s Archive, we must locate it within specific infrastructural ruptures, rather than viewing it merely as a natural evolution of shadow libraries. The most significant of these ruptures occurred in November 2022, when the United States Department of Justice seized over 200 domains associated with Z-Library, resulting in the arrest of two alleged operators in Argentina. This sudden enclosure of one of the internet’s largest shadow libraries exposed the fragility of centralized illicit infrastructures. Earlier shadow libraries such as LibGen and Sci-Hub had already demonstrated that decentralized digital repositories could outperform institutional access systems in scale and reliability. In direct response to this crisis of access, the anonymous team behind Anna’s Archive launched the platform not as a standalone repository, but as a decentralized, open-source “meta-library.” Rather than hosting a single proprietary collection, the platform aggregates indexes and metadata from multiple shadow repositories, allowing it to function as a coordination layer across an already distributed ecosystem. Its explicit, foundational purpose was to aggregate and back up existing shadow libraries—including Sci-Hub, LibGen, and Z-Library—ensuring their code and catalogs remain resilient against future state and corporate takedowns (Van der Sar, 2022).
In this lineage, Anna’s Archive marks a significant departure. While it inherits the moral outrage of Sci-Hub and the comprehensive archiving impulse of LibGen, it introduces a new element into its performance of legitimacy: an ideological interest in infrastructural openness and self-conscious narration. Where earlier projects resisted through anonymity or pragmatism, Anna’s Archive lays bare the new risk of corporate knowledge capture and ties their direct action to resisting that trend. This is a strategy designed through overt communication, differing from earlier projects that relied on anonymity or practicality. The shift aims to navigate contemporary platform politics and the challenges of AI-driven information extraction.
The transition from the physical risk of samizdat to the algorithmic scale of early digital libraries represents more than a technical upgrade. It marks the beginning of a shift toward cognitive legitimacy. Early platforms like LibGen proved that shadow libraries could be more reliable and comprehensive than the portals provided by formal universities. As these platforms moved from improvised stopgaps into durable infrastructures, the act of using them began to lose its transgressive quality. For a generation of researchers, these sites became the primary and taken-for-granted starting point for scholarly work. This normalization prepared the ground for Anna’s Archive to function not as a radical alternative, but as a routine and indispensable component of the digital research environment. However, this very normalization, the fact that these archives have become massive, reliable, and taken-for-granted substrates of data, has also made them the primary target for a new form of institutional enclosure: mass data extraction for AI. Recent reporting and legal filings indicate that LLM training datasets such as Books3 incorporated materials sourced from shadow libraries, linking these infrastructures directly to commercial AI development.
Contesting legitimacy in the age of AI
The normalization of shadow libraries has inadvertently transformed them into the primary substrate for a new form of institutional enclosure: mass data extraction for AI. This extraction is not merely theoretical but is occurring at an industrial scale, turning search systems and infrastructures into what Caliskan et al. (2025) identified as “supply chains” of extraction and control. For example, the Books3 dataset, which is directly linked to the shadow library LibGen, comprises a massive repository of texts that was integrated into the 800GB “Pile” dataset used for language modeling. In 2023, authors filed a lawsuit against Meta, providing concrete legal allegations that the company’s LLaMA models were trained on these exact shadow library collections. This process operationalizes enclosure by ingesting a freely accessible, collectively maintained public commons and transforming it into proprietary, closed-source commercial AI products.
The legitimacy framework developed in the preceding sections provides the analytical tools to understand this crisis, but it also reveals a structural shift. Until this point, the archive’s four modes of legitimacy operated in mutual reinforcement. Pragmatic utility drew users to the platform, moral discourse gave their participation a political justification, community labor kept the infrastructure running, and over time cognitive normalization rendered the entire arrangement invisible as ordinary research practice. Mass AI extraction fractures this alignment by placing the four registers in tension with one another. At the level of pragmatic legitimacy, the archive faces a paradox. The infrastructural competence that established its authority, its scale, and metadata precision, is precisely what makes it an irresistible target for corporate extraction. AI scraping validates the archive’s utility even as it threatens to overwhelm its servers. The Secure File Transfer Protocol (SFTP) access offered to firms like DeepSeek (Van der Sar, 2026) represents a survival mechanism: an attempt to convert unsustainable scraping loads into managed partnerships that keep the infrastructure operational. This pragmatic pivot, however, comes at a cost to the archive’s moral legitimacy. The archive’s moral claim had rested on a clear opposition between corporate enclosure and public access. AI extraction introduces a destabilizing third term: extraction from the commons by capital. The archive must now distinguish between what it frames as legitimate openness, where human researchers exercise a right to knowledge, and illegitimate openness, where corporate actors harvest collectively maintained metadata to build proprietary products. This forced distinction exposes the limits of the universalist rhetoric that sustained the archive’s moral authority and reveals that not all acts of access serve the same political ends. The implications for community legitimacy are arguably more severe. The collective labor of seeders, mirror operators, metadata maintainers, and volunteer developers was premised on a commons ethic in which participation sustained a shared infrastructure against corporate enclosure. When that labor is non-consensually harvested and monetized by AI firms, it reproduces the dynamic of alienated extraction that the archive was built to resist. Voluntary maintenance is transformed, without consent, into unpaid data labor for commercial AI development. The archive’s cognitive legitimacy is also unsettled. Its taken-for-granted status as a routine feature of the research environment depended on the assumption of a non-commercial commons. The revelation that the archive functions simultaneously as a research tool and as a training corpus for corporate AI renders its status newly visible and contestable, reversing the infrastructural invisibility (Bowker and Star, 1999) on which cognitive legitimacy depends.
Anna’s Archive’s own server telemetry provides direct, material evidence of this extractive shift. As the archive’s operators explicitly noted, “We’ve noticed that a lot of the AI scraping activity isn’t aimed at downloading books, but at crawling our metadata” (Anna’s Archive, 2024). This demonstrates that AI developers are not utilizing the archive for traditional reading or humanistic research; rather, they are systematically harvesting its highly structured, cross-referenced catalog to map relationships between texts for LLM training. By logging scraping behavior and flagging these AI training patterns, the archive’s visibility functions as a form of “counter-surveillance.” This dynamic echoes what Mark Andrejevic (2007) termed lateral surveillance, a peer-based watching that imitates top–down control while undermining its authority.
Faced with this crisis across all four registers of legitimacy, Anna’s Archive has attempted to negotiate its position by offering conditional, institutional access, most notably through pragmatic partnerships with AI developers like DeepSeek. To manage the immense server load caused by AI scraping, the archive proposed granting companies direct access to their databases via SFTP, a proposition actively explored by major AI developers (Van der Sar, 2026). The archive justifies this by asserting, “If you want to train your AI on the collective knowledge of humanity, you should at least credit the infrastructure you’re using—and support it.” This move is best understood as a pragmatic legitimacy strategy, a calculated trade-off that sacrifices some ideological purity to preserve the infrastructural capacity on which all other forms of legitimacy depend. However, while framed as a way to sustain the infrastructure, this financial entanglement risks transforming the project from an open knowledge commons into a commercial data broker, directly undermining the moral and community legitimacy the archive has built over the preceding years.
For a platform like Anna’s Archive, whose vast, structured, and freely accessible collection makes it an extraordinarily valuable resource for training AI models, this presents an existential threat and a unique opportunity. It forces the project to confront a new set of questions that cut across every register of its authority. How does a radical open-access project maintain its pragmatic utility without becoming complicit in corporate extraction? Can it sustain moral authority when its data serves the very enclosure it opposes, and can it retain the loyalty of the community whose labor is being alienated without consent? More broadly, how does it preserve the taken-for-grantedness that cognitive legitimacy requires when the conditions of its existence have become newly visible and contested?
The archive’s response has been to extend its performance of legitimacy into this new domain, attempting to repair the fractures that AI extraction has introduced. As laid out in a blog post titled “AI & Copyright,” the organizers explicitly address the “dual-use” nature of their collection (Anna’s Archive, 2025b). Their response is not to lock down the data, but to propose a framework of ethical access, drawing a sharp distinction between “proprietary, closed-source AI models” and “non-commercial research purposes.” This distinction amounts to an act of moral legitimacy work, an attempt to preserve the archive’s ethical narrative by differentiating between forms of openness that serve the commons and those that serve capital. They do not reject AI outright but push back against unacknowledged extraction, articulating a politics of conditional openness. As they state: “If you want to train your AI on the collective knowledge of humanity, you should at least credit the infrastructure you’re using—and support it.” This move reframes the archive as an active agent negotiating access, reflecting what Bodó et al. (2021) called the “politics of legitimacy” in digital libraries. By making its metadata public and auditable, the archive attempts to shift accountability from a legal obligation to a protocolic function. This forces a confrontation between the archive’s radical transparency and the extractive, opaque supply chains of generative AI development.
This performance of principled distinction is an act of “infrastructural politics.” While the archive cannot technically prevent scraping, it uses its platform to make a powerful normative statement. This strategy aligns with what Salamon and Saunders (2024) called “the arts of digital resistance,” where actors mobilize infrastructure not just to evade domination, but to reassert control over meaning. It raises new dilemmas around responsibility and the politics of visibility (Amoore, 2020) and highlights the tension decentralized infrastructures must navigate between transparency and protection (Bareikytė and Makhortykh, 2024; Jhaver et al., 2023).
This stance also allows Anna’s Archive to perform a kind of moral foresight, positioning itself as a responsible steward. The organizers are unapologetic, stating, “We are ideologues. We believe that preserving and hosting these files is morally right.” They frame copyright law as “absurd” and assert a different mandate: shadow libraries “can do things that other institutions are not allowed to do.” This “infrastructural speaking,” turning legal risk into a public narrative, contrasts sharply with the secrecy of corporate AI development and enhances the archive’s moral legitimacy.
The AI question also highlights the political significance of the archive’s commitment to metadata. In an age of LLMs, where the provenance of training data is often obscured, a well-curated, open metadata catalog is a radical political tool. Anna’s Archive does not just host files; it actively merges and cross-references metadata from disparate, siloed shadow libraries (such as Sci-Hub and LibGen) with proprietary commercial records (like WorldCat). By mapping proprietary ISBNs and DOIs directly to decentralized BitTorrent hashes, the archive performs a radical epistemic linkage. It bypasses established institutional authorities to create a unified, illicit finding aid. This curatorial judgment shapes the archive’s epistemic structure and, as Beer (2023) noted, the interface itself mediates what counts as data. Ironically, this clarity makes the archive especially attractive to AI developers.
This metadata work is both logistical and political. As Bowker and Star (1999) argued, classification systems produce regimes of inclusion and exclusion. The LLMs rely on metadata’s coherence to generate structure, turning the archive’s catalog into a map of how texts interconnect. This makes metadata a site of infrastructural contestation. By making its metadata public, Anna’s Archive performs a “counter-archival” refusal of corporate secrecy (Thylstrup, 2022) and functions as a “critical infrastructure” (Ratto, 2011), a system that sustains alternative practices while exposing the logic of those who exploit it. The metadata catalog becomes a form of “counter-data,” a tool for critical inquiry into the very systems that seek to extract value from it.
While the archive frames its SFTP access for AI developers as a means to “support the infrastructure,” this financial entanglement risks transforming the project from a public commons into a commercial data broker. This tension reveals a structural contradiction running through the archive’s legitimacy performance. Pragmatic survival may come at the cost of moral authority, as the archive effectively monetizes the very metadata it claims to liberate. At the same time, community legitimacy is strained by partnerships that redirect collectively maintained resources toward corporate ends.
In this contested new terrain, Anna’s Archive is performing a delicate balancing act across all four registers of its authority. Search systems and their infrastructures function as “supply chains” of extraction and control (Caliskan, Mackenzie, and McGowan, 2025). Anna’s Archive resists this logic by rerouting metadata toward shared utility, enacting “counter-indexing” to reveal how knowledge moves. The archive’s visibility functions as a form of “counter-surveillance,” a “surveillant assemblage” in reverse (Haggerty and Ericson, 2000) that logs scraping behavior and flags AI training patterns. This dynamic echoes what Mark Andrejevic (2007) termed lateral surveillance, a peer-based watching that imitates top–down control while undermining its authority. At the same time, the act of surveillance itself (logging scraping, distinguishing human users from bots) serves a community legitimacy function, demonstrating to the archive’s base that their labor is being defended against appropriation.
Conclusion
Anna’s Archive, like the samizdat networks and early shadow libraries that preceded it, is a direct response to a system of information control perceived as illegitimate. It synthesizes the embodied, moral critique of samizdat with the pragmatic, collective maintenance of post-Soviet piracy. Anna’s Archive represents a significant evolution in illicit politics precisely because it is a project that acknowledges its place in the political landscape through a design that is both discursively sophisticated and infrastructurally competent.
This article has argued that this performance is the key to understanding the archive’s resilience and social significance, while also exposing its desire to shape the future of the digital commons. By applying a four-part framework of pragmatic, moral, community, and cognitive legitimacy, we can see how the archive systematically builds its authority outside of legal structures. Its pragmatic legitimacy is grounded in its infrastructural competence: its comprehensiveness, its user-friendly design, and its decentralized resilience. Its moral legitimacy is constructed through a discursive strategy that frames its work as a just and necessary act of civil disobedience against a corrupt system of corporate control over knowledge. Its community legitimacy emerges from the participatory labor of maintainers whose collective investment transforms the archive from a service into a shared political project. Scholars encounter the archive when searching for inaccessible books or articles, and repeated use in these situations gradually normalizes the platform within research workflows. In this sense, cognitive legitimacy should be understood as a sociological process of normalization rather than evidence of individual belief.
The challenge posed by the rise of large-scale AI has not simply provided a new set of conditions to perform and refine the archive’s political identity; it has exposed the structural limits of its legitimacy project. Where the four modes of legitimacy once operated in mutual reinforcement, AI extraction has placed them in tension. Pragmatic survival strategies risk undermining moral authority. Community labor is alienated by corporate partnerships that monetize collectively built resources. And the cognitive taken-for-grantedness of the archive is disrupted as its new role as a training corpus makes its existence newly visible and contestable. The archive’s response has been not to retreat but to attempt a normative reframing that holds these contradictions together. Anna’s Archive finds moral legitimation through asserting its agency and advocating for a new social contract for the use of public data. The durability of this reframing, however, depends on whether the archive can sustain community trust while it simultaneously pursues pragmatic entanglements with the corporate actors its moral narrative opposes.
Unlike the politics of illicit knowledge circulation seen in samizdat, which focused on the resistance of individual forbidden texts, Anna’s Archive represents an illicit politics of knowledge circulation. This is a fundamental category shift: the archive moves beyond distributing specific illicit content toward the wholesale appropriation and reengineering of the infrastructure of the knowledge economy itself.
The case of Anna’s Archive presented here hopes to offer insights for scholars of new media, platform studies, and digital politics. It demonstrates that legitimacy is not a static property conferred by law. Rather, as Ewick and Silbey (1998) argued, legitimacy is a social construction that is continuously made and unmade. Anna’s Archive is an example of this process in action, an infrastructural intervention (Plantin et al., 2018) that functions as a digital protest technology (Balan, 2024). By performing its commitment to openness and deploying powerful “infrastructural metaphors” (Cristofari, 2024), it challenges proprietary platforms on their own terms and “wills an alternative knowledge commons into being.”
The archive operates within ongoing struggles over technical control, legal risk, and infrastructural visibility. It serves as a pedagogical device that “rematerializes the politics of knowledge” (Beer, 2023) forcing us to see and confront the systems that govern our access to information. The archive exposes how classification, access control, and infrastructural choices shape knowledge practices. It challenges us to look beyond the legal status of a platform and to analyze the ways in which it builds social and moral authority. And it serves as a powerful reminder that the infrastructure of knowledge is not a settled issue, but a site of continuous and necessary struggle.
Footnotes
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
