Abstract
Sci-Hub hosts pirated copies of 51 million scientific papers from commercial publishers. This article presents the site’s characteristics, it criticizes that it might be perceived as a de-facto component of the Open Access movement, it replicates an analysis published in Science using its available usage data, but limiting it to Latin America, and presents implications caused by this site for information professionals, universities and libraries.
Scientific articles are vital for students, professors and researchers in universities, research centers and other knowledge institutions worldwide. When academic publishing started, academies, institutions and professional associations gathered articles, assessed their quality, collected them in journals, printed and distributed its copies; with the added difficulty of not having digital technologies. Producing journals became unsustainable for some professional societies, so commercial scientific publishers started appearing and assumed printing, sales and distribution on their behalf, while academics retained the intellectual tasks. Elsevier, among the first publishers, emerged to cover operations costs and profit from sales, now it is part of an industry that grew from the process of scientific communication; a 10 billion US dollar business (Murphy, 2016).
Many librarians and researchers have criticized the commercial nature of scientific publishing and its increasing costs. This decades-old debate grew with digital technologies, which allowed journals to be gathered in portals, referred to as academic databases that facilitate searching and downloading at an individual or institutional subscription cost. Currently, this is the main delivery channel for scientific papers. Before the Web, sharing papers implied photocopies, fax and the postal service. Advances in technology made stakeholders criticize the commercial publishing model even more, because digital environments would arguably reduce production costs and nowadays it is easy to host a website or an open access (OA) repository with few resources. “We are currently spending about US$ 10b annually on legacy publishers, when we could publish fully open access for about US$200m per year” (Brembs, 2016, para. 3). Email and social media further allow massive, easy and convenient ways to share papers. The costs of subscriptions to academic databases prevent many knowledge institutions from affording them. Even Harvard University Library, the academic library with the wealthiest budget in the world, is now struggling to afford its subscriptions costs of around 3.5 million US dollars per year (Sample, 2012).
In the 21st century criticism was turned into disobedience and disruption. Aaron Swartz’s Guerilla Open Access Manifesto i advocates copyright violation. Swartz massively downloaded documents from JSTOR’s database in the Massachusetts Institute of Technology between 2010 and 2011. This resulted in a disproportionate legal procedure, when the United States government seemed to be concerned with having a landmark copyright case and set an example with Swartz, a persistent and active critic of copyright and digital rights advocate. In 2013, Swartz committed suicide in the midst of prosecution, his death inspired ‘academic civil disobedience’, which among other things involved academics sharing their published papers without the publishers’ consent in Twitter, with the hashtag #PdfTribute; a hashtag that joined #ICanHazPDF, adopted in 2011 and a common request for papers.
Sci-Hub, its supporters and adversaries
In September 2011, Alexandra Elbakyan, software developer and neurotechnologist from Kazakhstan, founded the website Sci-Hub, which aims: ‘to remove all barriers in the way of science’. Since then, it provides thousands of users per day with free downloads of over 51,000,000 scientific papers (Van der Sar, 2016a). Sci-Hub has become the largest website in history to challenge publishers’ models on a massive scale. The only data needed to download articles, book chapters, monographs, or conference proceedings are a document’s title, Digital Object Identifier (DOI), PubMed identifier or Uniform Resource Locator (its web address). Sci-Hub imitates the Internet Protocol (IP) addresses of institutions subscribing academic databases to download papers from them, and ‘pirated’ copies of already retrieved articles are stored in the site for future requests. Sci-Hub’s activities did not go unnoticed.
In June 2015, Elsevier, “home to almost one-quarter of the world’s peer-reviewed, full-text scientific, technical and medical content” (Devore and Demarco, 2015: 5), filed a copyright infringement complaint in New York against Sci-Hub and The Library Genesis Project (LibGen), a similar and associated website hosting scientific papers, books, standards, magazines and comic books. After this legal procedure, the site was not out for long and resurfaced with different domain names after its initial .org domain, together with an .onion address, only accessible with TOR (software used to access websites in the hidden or deep Web) and harder to take down. Elsevier accused Sci-Hub and LibGen of illegally accessing accounts of students and institutions to provide free access to papers exclusively available on ScienceDirect. Elbakyan argued that publishers act illegally by “limiting the spread of knowledge by charging people to read them”, while citing article 27 of the United Nations’ Declaration of Human Rights “to share in scientific advancement and its benefits” (Henderson, 2016).
In October 2015, Elsevier had a court victory, as the ruling stated that Sci-Hub violates United States (US) copyright law. However, it is still online, Elbakyan – not a US resident - is unlikely to pay any damages to Elsevier, and the lawsuit gave Sci-Hub enormous publicity; gaining support from the Electronic Frontier Foundation (Harmon, 2015) and some researchers. Science surveyed 11,000 researchers, 60% claimed to have used Sci-Hub, 88% considered ‘not wrong’ to use it, and 62% claimed that it disrupts the publishing industry; when asked why they use the site, 50% indicated they lacked legal access, 17% found the site convenient and 23% because they disagree with publishers’ models (Travis, 2016). Sci-Hub was nominated for the Free Knowledge Award by the Russian Wikimedia chapter (Van der Sar, 2016b). Some supporters signed the open letter ‘In solidarity with Library Genesis and Sci-Hub’, which compares Elsevier to the businessman in the Little Prince by Antoine de Saint Exupéry, a character who accumulates stars with the purpose of buying more without being of ‘use to the stars’. It states that the publishing model “devalues us, authors, editors and readers alike. It parasites on our labor, it thwarts our service to the public, [and] it denies us access” (Custodians Online Campaign, 2015: para. 4). The lawsuit probably caused Elsevier more harm than good, as in similar cases with other copyright industries (movies and music labels) when they have sued or started public campaigns against disruptive sites, technologies or people. While the companies wish to shut down copyright infringement, they simultaneously raise infringers’ profiles by causing newsworthy stories. The unintentional publicity expands user base and illegal downloads because of the ‘noise’ caused by defending their intellectual properties. Perhaps this explains why other publishers have not been as aggressive as Elsevier; in this case, it seems “many in the publishing industry see the fight as futile” (Bohannon, 2016: 512), because “copyright lawsuits won’t stop people from sharing research” (Harmon, 2015: para. 1)
Criticism toward Elsevier is hardly new, for example Dutch universities started a national boycott in The Netherlands, its country of origin. Since 2012, 16,153 researchers have signed a petition demanding it change its business practices (The Cost of Knowledge, n.d.). The lawsuit against Sci-Hub brought new waves of authors sharing their published papers in their social media sites. In September 2015, before the verdict, Elsevier attempted to partner with Wikipedia for increasing citations and references to ScienceDirect papers on the site and donated 45 accounts to Wikipedia editors. This caused arguments between academics and open access (OA) advocates, “many of whom think that partnering with the likes of Elsevier not only goes against the spirit of Wikipedia, it could transform Wiki science articles into a front page for paywalled material” (Stone, 2015: para. 2). Although the purpose must have been increasing publications’ presence in Wikipedia and the chances of new sales, this would also increase published authors’ Altmetrics. By the end of 2015, the complete editorial board of the journal Lingua resigned because they could not agree on pricing and OA models with Elsevier. Although unrelated, this might have been influenced by the cited developments.
It opens access but it is not Open Access
Sci-Hub’s website states three main ‘ideas’ behind it: knowledge to all, no copyright, and OA. Regarding the latter, it declares: “The Sci-Hub project supports Open Access movement in science. Research should be published in open access, i.e. be free to read. The Open Access is a new and advanced form of scientific communication, which is going to replace outdated subscription models. We stand against unfair gain that publishers collect by creating limits to knowledge distribution”.
Researchers, librarians and OA advocates can understand the difference between Sci-Hub and OA very well, but the general public and mainstream media would not. For example, Murphy (2016) states in the New York Times that Elbakyan’s “protest against scholarly journals’ paywalls has earned her rock-star status among advocates for open access” (para. 2), but the OA community is so divided about Sci-Hub, that it is impossible to support such claim. Stating that infringing publishers’ rights is equal to OA may harm OA’s image and could hinder its ability to continue advancing. Peter Suber stated that Sci-Hub may have “‘strategic cost’ for the open-access movement, because publishers may take advantage of ‘confusion’ over the legality of open-access scholarship in general and clamp down. Lawful open access forces publishers to adapt (…) whereas unlawful open access invites them to sue” (Bohannon, 2016: 512). There may be some damage-control to be performed by OA advocates, who may already been seen as pirates. Priego (2016) discusses that Sci-Hub is not what OA is about, it is a signal about the current state of scientific publishing and the fact it has been wrongfully seen as a solution to this current state is an indicator of the small progress achieved by the OA movement since the Budapest declaration 14 years ago; “an example of a collective failure to communicate successfully the principles of openness to the mainstream” (Priego, 2016: para. 9). For Brembs (2016), OA efforts “to wrestle the knowledge of the world from the hands of the publishers, one article at a time, has resulted in about 27 million (24%) of about 114 million English-language articles becoming publicly accessible by 2014” (para. 4), while Elbakyan single-handedly made 48 million articles accessible. Sci-Hub forcefully removes the price of access and download, but that is not OA, as the research it makes available was originally published with technical, moral, social and legal restrictions (Priego, 2016); as opposed to research that was made available using an open licensing scheme, such as Creative Commons. This argument is crucial: papers appearing in a commercial journal were published there for various reasons – which are largely discussed and contested- and are hence subject to very specific restrictions stated in the publishers’ license agreements that are signed with the authors. OA is an alternative model for all that, it is not an illegal counterstrike.
Interestingly, Elbakyan (2016b) contested Priego (2016) on whether or not Sci-Hub is OA. Elbakyan (2016b) points out that it does support and it is OA because it grants access to ‘paywalled’ documents: a ‘whatever it takes’ approach to OA. There are other subtler and not as aggressive alternatives for protesting publishers, while also preventing damage to OA’s image and not infringing copyright laws. Open Access Button ii consists of a plugin software that, when used from the website of a commercially published article, searches for an OA version. Google Scholar does something similar as it links the metadata from commercial publishers, OA repositories and other academic networks. If there is an OA version of a given article, Scholar provides the links to both versions. However, these alternatives for delivering access to OA documents and others such as DOAJ, Open Access Theses and Dissertations, and OpenDOAR rest on the success of OA in general, of Green OA and researchers’ self-archiving responsibilities.
Usage data analysis
Bohannon (2016) wanted to answer three questions: “who are Sci-Hub’s users, where are they, and what are they reading?” He asked Elbakyan for data and so they worked on producing a dataset from Sci-Hub’s server logs, which was later made available in the Dryad Digital Repository (Elbakyan and Bohannon, 2016). The dataset contains three packages: a) Sci-Hub’s server logs from September 2015 to February 2016 with 28 million download requests; b) an IPython Notebook file that can help in processing the data; and c) a table of publishers names with their corresponding DOI prefixes, taken from the CrossRef website. We replicated Bohannon’s (2016) analysis published in Science, but limiting the data to that pertaining to Latin America.
The server logs contain the date and time of 28 million transactions, DOI requested, and the countries, cities and coordinates where these requests originated. We imported the six-monthly tables of server logs into a database manager, discarding the city and coordinates fields, as they were unnecessary for our purposes. We added a table with 32 countries of the region for limiting the usage data to these Latin American countries. This produced 6-monthly tables with the number of downloads per country, allowing us to get downloads per country and for all countries, both monthly and for the whole 6-month period. Figure 1 shows download tendency per month. In the dataset used, there are 18 days of missing data for November, when the site switched domain due to Elsevier’s lawsuit.

Latin American Sci-Hub downloads per month.
Using the 6-monthly tables, it was possible to sum downloads per country per month to get each country’s downloads during the period. Brazil, with more than a million downloads, heads the list (with 29.09% of the total downloads), followed by Mexico (14.32%), Chile (12.12%), Colombia (11.81%), Argentina (11.70%), Peru (10.63%), Ecuador (3.85%), and Venezuela (3%); other countries have less than 1% of downloads each. Table 1 shows downloads per country and the percentages they represent from the total regional downloads. There were 28 million worldwide downloads, but Bohannon (2016) stresses the difficulty of determining how they compare to legal downloads, because such numbers are not publicly available. Regardless, he cites a 2010 Elsevier report, which estimated over a billion downloads for all publishers that year; if so, Sci-Hub downloads would represent a small fraction. There were 3,512,109 downloads in Latin America, about 12.54% of the worldwide number. This was surprising, as our initial belief was that for a developing region with many countries and institutions that cannot afford academic databases, the numbers could have been higher. The US is the fifth country where most downloads take place, and a quarter are from “34 members of the Organization for Economic Cooperation and Development, the wealthiest nations with, supposedly, the best journal access (…) intense use of Sci-Hub appears to be happening on the campuses of U.S. and European universities” (Bohannon, 2016: 510). It appears that users are not only less privileged researchers from countries with limited access, so there may be other factors at play. For instance, Greshake (2016) found positive correlations between downloads, countries’ population sizes, gross domestic product and number of Internet users; although he found exceptions, such correlations could explain Brazil’s place among the top downloaders.
Sci-Hub downloads per country.
Higher downloads from the region were expected, because access to the financial resources to afford subscriptions is difficult, and many countries have different challenges; among others: financial and infrastructure limitations, bureaucracy, difficulties with providers, limited promotion of subscribed resources, lack of appropriate staff, and information or digital literacy deficiencies in users and even librarians. An information divide takes place among those universities that can access databases and those that cannot, which may affect the success and categorizations of universities in the same country. Moreover, if Sci-Hub replaces databases as the place to download academic papers, there is no way libraries can “properly track usage for the journals they provide and could wind up discontinuing titles that are useful to their institution” (McNutt, 2016: 497), this can make even more difficult to justify funds needed for subscriptions.
Some countries have governmental and national-level acquisition systems, where affiliated institutions contribute to a public consortium fund which negotiates acquisitions to commercial academic databases for each institution. In Argentina, the Ministry of Science, Technology and Productive Innovation (MinCyT) maintains the Electronic Library of Science and Technology, which gives access to the government institutions, public universities and some private universities (non-profits with doctorate programs and public accreditations). The Mexican Science and Technology National Council (CONACyT) and other institutions constituted the National Consortium of Scientific and Technological Resources (CONRICyT), which enables the access of public research centers, health and academic institutions, and also some private institutions that have conformed to capacity and competitive state indicators and accreditations. These two consortia allow to compare between Sci-Hub downloads and legal downloads, because they offer yearly reports. In Argentina, there were 3,094,943 downloads during 2015 (Biblioteca Electrónica de Ciencia y Tecnología, 2016), so the 410,986 Sci-Hub downloads represent 13.27% of this legal number; while Mexico had 21,727,633 downloads during 2015 (CONRICyT, 2016), so their 503,093 Sci-Hub downloads represent 2.31%. However, not many countries have national consortia. Colombia has not established a country-level system, but some libraries have organized in consortia that acquire subscriptions and save resources. The strict governmental currency exchange control in Venezuela prevents almost any public or private institution not directly affiliated to the state from accessing the US dollars needed to acquire subscriptions, even if they have the necessary funds in national currency. Many universities have been unable to acquire subscriptions for years, because they are seriously hindered by the currency control, budget limitations, and the lack of interest and policies from the state for supporting research and access to scientific resources.
Sci-Hub data allowed determining downloads by publisher by using the DOI data available in the monthly tables and the DOI prefixes table taken from CrossRef. The DOI was instrumental for this, as its first characters refer to the International DOI Foundation, the following characters indicate the publisher, while the characters after the slash (‘/’) identify article and journal. Figure 2 illustrates DOI components.

Anatomy of a DOI.
A process was programmed to take the first DOI characters before the ‘/’ while discarding the rest, then it cross referenced and replaced these first characters with the publishers’ names, and then it summed downloads of each publisher. The largest academic publishers top the downloads list, there were over 1.3 million downloads from Elsevier (38.09% of the total), followed by Springer (11.85%), Wiley Blackwell (10.10%), Nature Publishing Group (4.43%), and the American Chemical Society (3.99%). However, other large publishers are further down, for instance SAGE Publishing is number eight (1.59%) and Informa UK (Taylor & Francis) ranks 11 (1.21%). This data has Springer separated from Nature Publishing Group and Palgrave Macmillan, which merged into Springer Nature in 2015, so the merger would have higher downloads. Table 2 lists the main publishers with downloads from each and the percentage these downloads represent from the total. Fifteen publisher names are provided, while we grouped all other publishers together, the latter account for 18% of total downloads.
Latin American Sci-Hub downloads per publisher.
The final process performed with the DOI was to sum downloads of each different document. This determined that the 3.5 million regional downloads of the period corresponded to 2,093,371 different papers, ranging from one to 366 downloads each; so this allowed determining total downloads of each different document. Table 3 shows the ten most downloaded in the period. Eight of these are from the field of medicine, one from chemistry and one from biology. They were published in The New England Journal of Medicine (4), The Journal of the American Medical Association (2), Concepts in Magnetic Resonance (1), Journal of Bacteriology (1), Nature Reviews Microbiology (1), and The Lancet (1). The New England Journal of Medicine is among the most downloaded journals, although it offers its articles for free after a six-month embargo. Interestingly, none of these ‘Top 10’ documents are in the similar list compiled by Bohannon (2016), which has four papers related to Medicine, while the rest are from Engineering, Physics and Biology.
Top 10 most downloaded documents.
There is a certain analysis that Bohannon (2016) did not explore, which is to use DOI numbers to find the details (title, journal, research field) from a larger sample of papers. We selected a sample of the papers that were downloaded at least 50 times. This allows making ‘article-level’ statements but requires searching them manually by their DOI. This is useful for delivering a more qualitative, although limited, picture of the papers downloaded in the region. This group of 629 papers accounts for 52,579 downloads, an apparently large number, but it just represents 1.5% of the total downloads. The average age of the publication years is 2012 and most papers downloaded are very recent, 2016 (66), 2015 (285), 2014 (90), 2013 (36), 2012 (34), 2011 (16), 2010 (13), 2009-2006 (33) 2005-2000 (29), 1999-1990 (12), 1987-1983 (5), pre-1980 (10). Elsevier is the publisher with most downloads (320), followed by the Massachusetts Medical Society (104), and Springer Nature (91). The most downloaded journals are The New England Journal of Medicine (104) and Medicine - Programa de Formación Médica Continuada Acreditado (103). Most papers were from Medicine (481), while others were about Biology and Ecology (82), Psychology and Neuroscience (27), Chemistry (20) Engineering (10), Physics (6), Political Science (2), and Information Science (1). It was alarming to find, when looking at the specific topics, that a large amount of papers downloaded are about zika, chikungunya, dengue, Chagas, tuberculosis, diabetes, and asthma. This might cause the questions: Are health specialists in the region deprived of the much needed means to access these documents legally? Are these articles not important enough for global and regional safety and health that they should be in OA by default? Other questions: has Green OA provided enough disruption of the commercial system as well as access availability (Sci-Hub’s usage data would suggest that the answer is no), or are our researchers not skilled enough or are worried with other things that they cannot self-archive their publications?
What should we do about it?
Sci-Hub represents a threat to publishers and OA, which seeks the openness of science from its origin, rather than a complete disruption of elements legally protected. There may be fear that publishers could use Sci-Hub as an instrument to discredit OA, arguing that OA = pirated access, in addition to the incorrect idea that OA publications are of lower quality. Latin American OA advocates might have to raise the ‘academic civil disobedience’ needed to disrupt the system in more legitimate ways, as the most prominent developments seem to happen in the Anglo-Saxon world. We have mentioned Aaron Swartz, but Colombian Diego López shared a thesis in Scribd.com and its author filed a lawsuit claiming damages to his economic rights; this started a landmark regional case, and López got the support from national and international rights and OA organizations (Harmon, 2016). Latin America’s OA leadership was achieved through its early and widespread adoption, reflected in our repositories, journals and many publications analyzing OA research impact and indicators (Babini and Machin-Mastromatteo, 2015). Sci-Hub might damage these advances, not because it grants access, but because it distorts what OA is about and might lead people to cease using the OA platforms we have constructed with so much effort.
Curiously, regional Sci-Hub’s discussions arrived mainly through media outlets and English-language publications. However, librarians, researchers, universities and governments should start discussing about: a) examining current issues with scientific publishing dichotomies, mainly: open-closed, bibliometrics-altmetrics; b) strengthening institutional, national and regional OA policies, including mandatory self-archiving in our knowledge institutions; c) educating people about the difference between OA and Sci-Hub kind of ‘open access’; d) surprisingly, there were OA papers downloaded from Sci-Hub, so we should raise awareness in students, professors, government and general public about OA, their advantages and alternative models through new formative spaces, from an information-academic-scientific literacy perspective (Uribe-Tirado, 2015); e) librarians should do an exercise of self-criticism: when promoting, conducting information literacy activities and developing our libraries websites, are we including OA resources or are we exclusively centering on commercial resources; f) the large amount of medical papers downloaded from Sci-Hub is alarming, there could be regional initiatives to improve health specialists’ access to their scientific literature, to place or create more medical journals in OA and promote this option to these specialists; and g) debate how to improve the scientific production systems, which is administered by governments, and determines how research is assessed, measured, fostered and rewarded. They are hotly debated (for instance in Colombia), because they still regard publishing in journals indexed in Web of Science and the impact factor as the sole or most important indicators for evaluating researchers, while many academics are convinced this has to change, as these indicators are not really related to individual papers’ quality but with the journals’ market position. Each of the issues above are complex enough to be further discussed.
