Lessons from the Library: Extreme Minimalist Scaling at Pirate Ebook Platforms

Abstract:  At 33TB of data in its main collection, the highly illegal Library Genesis project is one of the largest repositories of copyright-violating educational ebooks ever created. Established over a decade ago in 2008, the goal of Library Genesis is nothing short of a modern Library of Alexandria, albeit without anyone’s legal sanction. As one of its administrators wrote: “within decades, generations of people everywhere in the world will grow up with access to the best scientific texts of all time. […] [T]he quality and accessibility of education to the poor will grow dramatically too. Frankly, I see this as the only way to naturally improve mankind: we need to make all the information available to them at any time” [Bodó 2018b]. Rooted in its homeland’s Russian communist principles and particularly the Soviet isolationist copyright policies of the twentieth century, Library Genesis is a formidable resource and threat to conventional academic publishers.

The Library Genesis database had just short of 1.2m records (books) in 2014 [Bodó 2018a]. As of January 2020, this capacity has doubled to 2.5m books. In this article, I examine the minimal computational design choices taken by this maximal-in-intent, illicit archive of epistemological dissent and how such decisions have shaped the scalability and growth of the platform. This includes Library Genesis’s numerical subdivision of record identifiers into “buckets” to work around directory file limitations in the GNU/Linux operating system; its use of md5 hashing of filenames within directories capped at 1,000 files to avoid future hashing collisions while allowing for on-disk integrity checking; and its use of the MySQL socket/network server as opposed to SQLite or similar disk-based database.

Beyond these computational details, though, the theoretical tension that this article highlights is the path dependencies that are set in (illegal) computational projects that have goals of absolute abundance and maximalist capacity, and the minimalist design principles that they must instigate at the outset to ensure a degree of scalability. I also query the ways in which the project’s contested mission statements target an economic (geographic) audience demographic with only minimalist access to high-capacity computing resources. I finally examine the limits on scalability of the distribution of the Library Genesis through its torrent archive and other distributed networking technologies such as IPFS, which despite their promise of peer-to-peer redundancy fall down on an archive of this size.

Negotiating Open Access: Ethical Positions and Perspectives: Library & Information Science Book Chapter | IGI Global

Abstract:  In this chapter, the authors interrogate the discursive terrain of the open access phenomenon to position the processual as well as the discourse communities that open access is inevitably enmeshed in. The essay explores the current climate of open access and investigates the ethical dilemmas that its subversive sibling of guerrilla open access foregrounds. Further, the essay also recommends a viable model that can be deployed by state players as an exemplar of academic socialism that is flexible, accommodative, and a true reflection of the open-access philosophy which also counters the development of otherwise illegal and ‘pirate’ models of open access.


Open Access Redefined: Survey Data and Literature Study on the Impact of Sci-Hub in Orthopaedic Research

Abstract:  Background Since Alexandra Elbanyan founded Sci-Hub in 2011, the website has been used by a growing number of researchers worldwide. Sci-Hub is a so-called shadow library or guerrilla open access format bypassing publishers’ paywalls, giving everyone free access to scientific papers. Until today, there have been no publications about usage by orthopaedic and trauma surgeons of Sci-Hub or other “pirate sites” and how it may influence their work.

Materials and Methods Orthopaedic and trauma surgeons of four university hospitals in Germany and Europe were consulted using a standardised questionnaire containing multiple items about the use and evaluation of Sci-Hub. In addition, the Medline and Cochrane databases were screened for all studies related to Sci-Hub. Two reviewers independently reviewed all articles and the references of these articles.

Results Of all orthopaedic surgeons consulted, 69% knew of Sci-Hub and 66.7% used it on a regular basis. Of the younger participants (< 45 years old), 77% knew the webpage, while only 25% of older participants (> 45 years old) knew the webpage. Ninety percent found the quality of their citation and research had been enhanced since using Sci-Hub. On a scale of 1 to 10, user-friendliness was rated with a mean rating of 7.58 (95% CI: 7.262–7.891). Ethical or legal concerns among users seem mixed. On a scale of 1 (no concerns) to 5 (many concerns), the mean score was 2.39 (95% CI: 2.154–2.615). Of doctors using Sci-Hub, 89% would recommend it to other colleagues.

Conclusion The quality and number of articles in Sci-Hub is outstanding, and the rate of young researchers using the website is high. The most important shift in literature research for decades is a phenomenon mostly used by young researchers and is not the subject of current research itself. Sci-Hub may have already changed how orthopaedic research works.

FBI Gains Access to Sci-Hub Founder’s Google Account Data * TorrentFreak

“Sci-Hub founder Alexandra Elbakyan says that following a legal process, the Federal Bureau of Investigations has gained access to data in her Google account. Google itself informed her of the data release this week noting that due to a court order, the company wasn’t allowed to inform her sooner….

In an email to Elbakyan dated March 2, 2022, Google advises that following a legal process issued by the FBI, Google was required to hand over data associated with Elbakyan’s account. Exactly what data was targeted isn’t made clear but according to Google, a court order required the company to keep the request a secret….”


Meet the ‘Pirate Queen’ fighting to kill paywalls on research | Digital Trends

“In 2011, Alexandra Elbakyan, then a 22-year-old student in Almaty, Kazakhstan, got fed up with this system and decided to throw a wrench in the gears. She created a program called Sci-Hub, a website reminiscent of The Pirate Bay that allows users to circumvent paywalls and download research articles for free….

Now, 10 years after she founded Sci-Hub, Elbakyan, who has been referred to as a “pirate queen” and “Robin Hood,” has found herself bogged down in lawsuits and investigations while she fights to provide the open access service that has become essential to the scientific community, particularly during the COVID-19 pandemic….

Currently, Sci-Hub has over 84 million papers in its database, according to its website, and users generally download between two million and three million each day. Elbakyan has observed that more scientific articles are available in open access than ever before, due to the influence of her work. But Sci-Hub continues to be embroiled in lawsuits and investigations. In January 2020, Sci-Hub’s Twitter account was suspended for violating the site’s counterfeit policy. And Sci-Hub has frozen downloads during the trial in India.”


Sci-Hub downloads show countries where pirate paper site is most used

“Download figures for Sci-Hub, the popular but controversial website that hosts pirated copies of scientific papers, reveal where people are using the site most. The statistics show that users accessing Sci-Hub from China are by far the most active — and that with more than 25 million downloads, usage in China outstrips the rest of the top ten countries combined (see ‘Global resource’).

Perhaps surprisingly, the figures also show that the United States, in second place, has about one-third as many downloads, at 9.3 million. “There is a widespread opinion that Sci-Hub is of no use in the United States, because universities have money to pay for subscriptions, but that is not true,” says Alexandra Elbakyan, the site’s founder.

The statistics are updated daily and show the number of downloads from each country over the past month — but they are not normalized for the size of the research population….”

Full article: A Librarian’s Perspective on Sci-Hub’s Impact on Users and the Library

Abstract:  On December 19, 2019, The Washington Post reported that the U.S. Justice Department is investigating the founder and operator of Sci-Hub Alexandra Elbakyan on suspicion of working with Russian intelligence to steal U.S. military secrets from defense contractors. The article further discusses Sci-Hub’s methods for acquiring the login credentials of university students and faculty “to pilfer vast amounts of academic literature.” This has long been public knowledge. But the confirmation of Sci-Hub potentially working with Russian intelligence was major news. Both fronts of the Sci-Hub assault on stealing intellectual property are concerning. Since many academic researchers and their employers routinely receive defense contracts to perform sensitive research, the article helped posit that offering free access to academic research articles is perhaps a Trojan Horse strategy for Sci-Hub. To add to The Washington Post’s report, we sought out individuals at universities with a vantage point on Sci-Hub’s activities to see if there is independent evidence to support the report. We spoke to Dr. Jason Ensor who at the time of this interview was Manager, Engagement Strategy and Scholarly Communication, Library Systems at Western Sydney University Library in Australia. Ensor holds four degrees in related critical thinking fields and is an experienced business professional in software development, data scholarship and print publishing. He is also a distinguished speaker on digital humanities and linked fields, presenting regularly in national and international forums.


Sci-Hub Blocking: Court Denies Researchers’ Application to Intervene * TorrentFreak

“Three researchers who sought to intervene in a court case that will determine whether Sci-Hub will be blocked by ISPs in India have had their application rejected. They argued that blocking access to copyrighted research papers would affect their work and harm the public interest. The judge found that an intervention could not be made on that basis….”

Sci-Hub Case: Delhi High Court Rejects Researchers’ Plea Seeking Impleadment In Infringement Proceedings

This article reports a recent development of the ongoing court case between Sci-Hub and Libgen and three publishers, Elsevier Ltd, Wiley India Pvt Ltd, and the American Chemical Society. The Delhi High Court has rejected an application filed by three researchers seeking impleadment in the ongoing infringement proceedings. 


Sci-Hub Case: Delhi High Court Rejects Researchers’ Plea Seeking Impleadment In Infringement Proceedings

“The Delhi High Court has rejected an application filed by three researchers seeking impleadment in the ongoing infringement proceedings in the Sci Hub case.

Justice C Hari Shankar rejected the impleadment application filed by Prof. Subbiah Arunachalam, Prof. (Dr.) Padmanabhan Balaram and Mr. Madhan Muthu claiming to be eminent researchers and scientists holding various coveted academic positions at some of the most prestigious universities in India.

The impleadment was sought in the suit filed by publishing houses Elsevier Ltd, Wiley India Pvt Ltd, and American Chemical Society against onlinerepositories Sci Hub and Libgen(another online repository of science articles) over alleged copyright infringement.

The applicants had supported the legality of SciHub and Libgen

“In my view such intervention cannot be permitted under Order I Rule BA of the CPC. If such intervention is permitted it would be a carte blanche for persons, who claim to be beneficiaries of material which is alleged to be infringing in nature to start intervening in the infringement proceedings, which would seriously impact the prosecution of the proceedings in the Court,” the Court ordered….”


International Tensions and “Science Nationalism” in a Networked World: Strategies and Implications

“The Coalition for Networked Information (CNI) Executive Roundtable that took place as part of the CNI Fall 2020 Virtual Membership Meeting examined the collision between developing international tensions and science nationalism on one side, and trends towards global, network-based collaboration and scholarly communication, particularly as driven by the adoption of open science practices, on the other….

There is a very broad-based effort to restructure the terms of open access (OA) publishing across the globe through so-called “transformative agreements” and efforts such as the European Union-based Plan S, which stipulates (among other things) that scientific publications resulting from publicly funded research be published in OA journals or platforms. Currently there’s a rough and still tentative alignment between the US and Europe on this effort; in particular, there is some ambiguity about the extent of support by US federal funders, as distinct from research universities (who have a wide range of views), for the Plan S style approach. Given the scale of publishing by Chinese researchers, it seems likely that unless China supports this restructuring effort, the economics globally will be at best problematic. While a few years ago some Chinese scholarly organizations seem to have expressed conceptual support for both this kind of OA and related initiatives about open research data, it’s unclear where this commitment now stands, or how it may relate to other emerging Chinese scholarly publishing strategies….

Some recent policy announcements seem to suggest that China is de-emphasizing the importance of publishing in very high prestige Western journals; interestingly, this is being cast as consistent with the efforts of Western and global open science advocates to focus assessments of scholarly impact on quality rather than quantity, and to de-emphasize measures such as the impact factor of the journals that results are published in. Note that to the extent that China is, or may be, investing in a national publishing infrastructure, this implies shifting investment away from contributions that might support a global restructuring of the Western scholarly publishing system (discussed above) towards new OA models. …”

DOI (Digital Object Identifier) for Systematic Reviewers and other Researchers: Benefits, Confusions, and Need-to-Knows | by Farhad | Jan, 2022 | Medium

“DOI enhances the accessibility, discoverability, trustability, and interoperability of digital objects and serves the openness and visibility of professionally published content. While I am not a DOI expert, I know about it because I use it a lot in my profession. I believe DOI will play a significant role in the automation of literature reviews. More than it does now.

It is the responsibility of librarians, information specialists and other information professionals to raise awareness about the benefits of DOI. …”

Toll-based access vs pirate access: a webometric study of academic publishers | Emerald Insight

Abstract:  Purpose

The purpose of this paper is to draw a comparison of the Web traffic ranking, usage and popularity of websites of databases of reputed publishers, namely, ScienceDirect and Emerald Insight, that provide access on subscription basis with Sci-Hub, on the basis of data obtained from Alexa databank (www.alexa.com). Sci-Hub is a website that provides pirated open-access to the research literature, where piracy, according to The Economic Times (2020), refers to the unauthorized duplication of copyrighted content.


Under present study, the quantitative study of the collected data was carried out with help of descriptive research methodology. The Alexa databank was singled out as the source of data. This study crawled through Alexa databank on 01.12.2019 and collected relevant data regarding Sci-Hub, ScienceDirect and Emerald Insight using the search terms Sci-hub.tw, Sciencedirect.com and Emeraldinsight.com sequentially. Different criteria were taken into consideration, which include global traffic rank, the average number of page views per user, time taken for uploading, bounce rate, percentage of users, the number of in-links and daily time spent on the site.


The results of this study showed that ScienceDirect has the highest traffic rank and in-linking sites among the surveyed databases. But highest number of page visits were recorded for Sci-Hub with fastest downloading speed. It has also been observed that the users spent less time on ScienceDirect and Emerald Insight as compared to Sci-Hub. This study further observed that Sci-Hub has the lowest bounce rate. Users from both the developing and developed economies use the Sci-Hub, though the highest number of visitors belongs to the developing nations.


This study provides an overview of the performance of toll-based publishing databases with pirated database based on different criteria through World Wide Web. Though, this study in no way supports or endorses the unauthorized and illegal access to knowledge, but such data helps in depicting and analyzing how much a particular database is accessed by its users all over the globe and also determines and illustrates the time spent by users while accessing a specific database, thus, providing the user preferences in information seeking activities. This study provides an overall view of adoption of open resources.