The General Index: Search 107 million research papers for free – Big Think

“Millions of research papers get published every year, but the majority lie behind paywalls. A new online catalogue called the General Index aims to make it easier to access and search through the world’s research papers. Unlike other databases which include the full text of research papers, the General Index only allows users to access snippets of content….” 

Giant, free index to world’s research papers released online

“In a project that could unlock the world’s research papers for easier computerized analysis, an American technologist has released online a gigantic index of the words and short phrases contained in more than 100 million journal articles — including many paywalled papers.

The catalogue, which was released on 7 October and is free to use, holds tables of more than 355 billion words and sentence fragments listed next to the articles in which they appear. It is an effort to help scientists use software to glean insights from published work even if they have no legal access to the underlying papers, says its creator, Carl Malamud. He released the files under the auspices of Public Resource, a non-profit corporation in Sebastopol, California that he founded.

Malamud says that because his index doesn’t contain the full text of articles, but only sentence snippets up to five words long, releasing it does not breach publishers’ copyright restrictions on the re-use of paywalled articles. However, one legal expert says that publishers might question the legality of how Malamud created the index in the first place….”

Archivists Create a Searchable Index of 107 Million Science Articles

“The General Index is here to serve as your map to human knowledge. Pulled from 107,233,728 journal articles, The General Index is a searchable collection of keywords and short sentences from published papers that can serve as a map to the paywalled domains of scientific knowledge.

In full, The General Index is a massive 38 terabyte archive of searchable terms. Compressed, it comes to 8.5 terabytes. It can be pulled directly from archive.org, which can be a difficult and lengthy process. People on the /r/DataHoarder subreddit have uploaded the data to a remote server and are spreading it across BitTorrent. You can help by grabbing a seed here.

The General Index does not contain the entirety of the journal articles it references, simply the keywords and n-grams—a string of simple phrases containing a keyword—that make tracking down a specific article easier. “This is an early release of the general index, a work in progress,” Carl Malamud, the founder of Public.Resource.org and co-creator of the General Index, said in a video about the archive. “In some cases text extraction failed, sometimes metadata is not available or is perhaps incorrect while the underlying corpus is large, it is not complete and it is not up to date.”…”

California nonprofit pushes states to make jury instructions more broadly available

“The U.S. Supreme Court’s 2020 Georgia v. Public.Resource.Org Inc. ruling, however, gave him new hope. In a 5-4 decision issued last April, the court sided with the California nonprofit Public.Resource.Org, which had been sued for copyright infringement by the State of Georgia for having purchased and posted the state’s official statutory code.

Chief Justice John G. Roberts Jr. wrote in the majority opinion that “officials empowered to speak with the force of law cannot be the authors of—and therefore cannot copyright—the works they create in the course of their official duties.”

In May, Lanning contacted Carl Malamud, Public.Resource.Org’s founder and president, seeking assistance with his jury instructions efforts.

In the months since, Malamud has succeeded in prompting Wisconsin to make its jury instructions available for free. His nonprofit has also teamed with a University of California at Berkeley legal clinic in hopes of convincing California officials to remove copyright claims on the state’s jury instructions….”

Free the California Jury Instructions: Call for Legal Practitioner, Law Professor and Law Librarian Support for a California Rule Change Proposal

“We at the Samuelson Law, Technology & Public Policy Clinic at the University of California, Berkeley, School of Law are representing Public.Resource.Org in a petition to the Judicial Council of California to clarify that California’s jury instructions are in the public domain and free for public use. We’re requesting support for the petition from legal practitioners, law professors and law librarians. Please consider signing the statement below; thank you!

Your name, title, and institutional affiliation will accompany the below statement as a signatory. Your affiliation is for identification purposes only; we will make clear that it does not imply endorsement by your firm, law school, or other institution….”

Statement by AERA, APA, and NCME on Withdrawal of Lawsuit Against Public.Resource.Org

“We are pleased to announce that the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education have dropped their lawsuit against Public.Resource.Org, Inc.  That lawsuit had been commenced to protect our copyright and enjoin Public.Resource.Org’s online publication of the 1999 edition of our acclaimed Standards for Educational and Psychological Testing, the joint work product of the three organizations. The 1999 edition has been superseded by the substantially-revised 2014 edition of the Standards, and we recently determined that we would ourselves make the 1999 edition available online without charge for those who wished to view it for historical, scholarly, or other purposes.  In light of our own decision to make 1999 edition freely available, we decided to withdraw our objection to Public.Resource.Org’s online publication of that volume.   

We are continuing to review our policies regarding subsequent editions of the Standards, bearing in mind the importance of our work and our desire to ensure that the Standards will continue to play a vital and leading role in supporting proper practices in educational and psychological testing….”

Education Groups Drop Their Lawsuit Against Public.Resource.Org, Give Up Their Quest to Paywall the Law

“This week, open and equitable access to the law got a bit closer. For many years, EFF, along with co-counsel at Fenwick & West and attorney David Halperin, has defended Public.Resource.Org in its quest to improve public access to the law — including standards, like the National Electrical Code, that legislators and agencies have made into binding regulations. In two companion lawsuits, six standards development organizations sued Public Resource in 2013 for posting standards online. They accused Public Resource of copyright infringement and demanded the right to keep the law behind paywalls.

Yesterday, three of those organizations dropped their suit. The American Educational Research Association (AERA), the National Council on Measurement in Education (NCME), and the American Psychological Association (APA) publish a standard for writing and administering tests. The standard is widely used in education and employment contexts, and several U.S. federal and state government agencies have incorporated it into their laws….”

Official Code of Georgia Annotated now a Github Repo | Boing Boing

“You might think a Supreme Court ruling in our favor would be enough to get governments to change their tune, but Georgia hasn’t done a thing, nor have other states that try and build walls around their laws. The State doesn’t publish their code, and the awful site they refer you to is run by Lexis, only provides the unannotated unofficial code of Georgia, and subjects you to onerous terms of use, an awful design, and a total lack of respect for laws that mandate access to the visually impaired. which Public Resource is spending thousands of dollars per year with the official vendor to get copies of the laws of Georgia, Mississippi, and a handful of other states. Georgia alone is costing us $1,324 per year!

 

What we get for our yearly subscription is a quarterly CD-ROM for each state that only runs on Windows. You can, with some difficulty, export the titles of the code as Microsoft Word files in .rtf format. Well, we now have 8 quarterly releases of code extracted as .rtf files and hosted on the Internet Archive, with transformations to Open Document format. These .rtf files are not the greatest. Any links have been removed and there is no structure—lists, for example, are not lists, just ordinary paragraphs.

Today, I am delighted to announce that we’ve taken the next step. Working with my friends at Unicourt and their crack engineering team in Mangaluru, India, we’re releasing today a github repository that transforms those .rtf files into beautiful html. The RTF parser is the code that does the transformation. It puts structure, metadata, and accessibility back to the code. Any pointers to other code sections are marked, tables of contents now work properly, and we’ve tagged references to other resources such as the U.S. Code, Code of Federal Regulations, and other federal and state materials so that over time these will become more and more useful. A second github repository holds the Georgia transforms and over the next year, we’re going to be adding Arkansas, Colorado, Kentucky, Mississippi, and Tennessee. We’re also hoping to add an xml diff capability, so we can generate redlines. If you just want to browse the html files, you can also view them on the Internet Archive. For example, here is Title 1 of the OCGA, current as of August, 2020. Just for good measure, we also added opinions of the Attorney General and the court rules….”

Public Resource Receives $5 million, 5-year Grant From Arcadia

“Public.Resource.Org (“Public Resource”) is pleased and delighted to announce that we have received a $5 million grant from Arcadia, a charitable fund of Lisbet Rausing and Peter Baldwin. This grant will support our work from 2020-2025, and is in addition to the $1.5 million in funding from Arcadia which supports our work from 2018-2020. This kind of sustained support over the long haul is so rare in the world of nonprofits, and Public Resource is very grateful for the help and inspiration Arcadia has shown us….”

Justices debate allowing state law to be “hidden behind a pay wall” | Ars Technica

“The courts have long held that laws can’t be copyrighted. But if the state mixes the text of the law together with supporting information, things get trickier. In Monday oral arguments, the US Supreme Court wrestled with the copyright status of Georgia’s official legal code, which includes annotations written by LexisNexis.

The defendant in the case is Public.Resource.Org (PRO), a non-profit organization that publishes public-domain legal materials. The group obtained Georgia’s official version of state law, known as the Official Code of Georgia Annotated, and published the code on its website. The state of Georgia sued, arguing that while the law itself is in the public domain, the accompanying annotations are copyrighted works that can’t be published by anyone except LexisNexis.

Georgia won at the trial court level, but PRO won at the appeals court level. On Monday, the case reached the US Supreme Court.

During Monday’s oral argument, some justices seemed skeptical of Georgia’s position.

“Why would we allow the official law to be hidden behind a pay wall?” asked Justice Neil Gorsuch.

Georgia’s lawyer countered that the law wasn’t hidden behind a paywall—at least not the legally binding parts. LexisNexis offers a free version of Georgia’s code, sans annotations, on its website.

But that version isn’t the official code. LexisNexis’ terms of service explicitly warns users that it might be inaccurate. The company also prohibits users from scraping the site’s content. If you want to own the latest official version of the state code, you have to pay LexisNexis hundreds of dollars. And if you want to publish your own copy of Georgia’s official code, you’re out of luck….”

Georgia v. PublicResource.Org: Copyright Case Before the Supreme Court | Authors Alliance

“The Code Revision Commission (the “Commission”), an arm of the State of Georgia’s General Assembly, is mandated to ensure publication of the statutes adopted by the General Assembly. It does so by contracting with the LexisNexis Group (“Lexis”) to maintain, publish, and distribute the Official Code of Georgia Annotated (“OCGA”), an annotated compilation of Georgia’s statutes. Following guidelines provided by the Commission, Lexis prepares and sells OCGA, which includes the statutory text of Georgia’s laws and annotations (such as summaries of judicial decisions interpreting or applying particular statutes). Lexis also makes unannotated versions of the statutes available online.

Public.Resource.Org (“PRO”) is a non-profit organization that promotes access to government records and primary legal materials. PRO makes government documents available online, including the official codes and other rules, regulations, and standards legally adopted by federal, state, and local authorities, giving the public free access to these documents. PRO purchased printed copies of the OCGA, digitized its content, and posted copies online through its own website.

Georgia filed suit against PRO claiming copyright infringement. Before the lower courts, PRO invoked the judicially-created “government edicts” doctrine. As a matter of public policy, courts have held that government edicts having the force of law, such as statutes and judicial decisions, are not eligible for copyright protection. While the court of first instance agreed with the State of Georgia and the OCGA was found to be copyrightable, on appeal the Eleventh Circuit held that under the government edicts doctrine, OCGA is not copyrightable and rejected Georgia’s infringement claim against PRO. Now, the issue before the Supreme Court is whether Georgia can claim copyrights over the OCGA annotations or if it is prevented from doing so because the annotations are an edict of government….”

Georgia v. PublicResource.Org: Copyright Case Before the Supreme Court | Authors Alliance

“The Code Revision Commission (the “Commission”), an arm of the State of Georgia’s General Assembly, is mandated to ensure publication of the statutes adopted by the General Assembly. It does so by contracting with the LexisNexis Group (“Lexis”) to maintain, publish, and distribute the Official Code of Georgia Annotated (“OCGA”), an annotated compilation of Georgia’s statutes. Following guidelines provided by the Commission, Lexis prepares and sells OCGA, which includes the statutory text of Georgia’s laws and annotations (such as summaries of judicial decisions interpreting or applying particular statutes). Lexis also makes unannotated versions of the statutes available online.

Public.Resource.Org (“PRO”) is a non-profit organization that promotes access to government records and primary legal materials. PRO makes government documents available online, including the official codes and other rules, regulations, and standards legally adopted by federal, state, and local authorities, giving the public free access to these documents. PRO purchased printed copies of the OCGA, digitized its content, and posted copies online through its own website.

Georgia filed suit against PRO claiming copyright infringement. Before the lower courts, PRO invoked the judicially-created “government edicts” doctrine. As a matter of public policy, courts have held that government edicts having the force of law, such as statutes and judicial decisions, are not eligible for copyright protection. While the court of first instance agreed with the State of Georgia and the OCGA was found to be copyrightable, on appeal the Eleventh Circuit held that under the government edicts doctrine, OCGA is not copyrightable and rejected Georgia’s infringement claim against PRO. Now, the issue before the Supreme Court is whether Georgia can claim copyrights over the OCGA annotations or if it is prevented from doing so because the annotations are an edict of government….”

Supreme Court to decide if Georgia code is free to the public

“On Monday, the U.S. Supreme Court will take up that question as the justices consider whether the annotated version of Georgia code is protected under copyright law or should be made available to the public free of charge.

The hotly disputed case, pitting the state against an open records proponent, has caught the attention of the Trump administration, whose lawyers say Georgia’s code should be protected. At the same time, news media and civil rights organizations are also weighing in, contending the public should have unhindered access to the state code….”

Clinic Files Law Scholar Briefs, Supporting Public.Resource.Org | Cyberlaw Clinic

“On Friday, November 22, 2019, the Cyberlaw Clinic and local counsel Marcia Hofmann filed amicus briefs in the United States District Court for the District of Columbia in two related cases, ASTM v. Public.Resource.Org (.pdf), and AERA v. Public.Resource.Org (.pdf). The cases involve copyright infringement claims brought by standards development organizations (SDOs) against Public.Resource.org. The cases are back before the United States District Court for the District of Columbia on remand from the United States Court of Appeals for the District of Columbia Circuit. The core issue in front of the Court is whether PRO’s provision of free online access to codes that were developed by the plaintiffs — but incorporated by reference into binding law — constitutes fair use….”

Clinic Files SCOTUS Brief w/Caselaw Access Project, Arguing for Unburdened Access to Law | Cyberlaw Clinic

“This week, the Cyberlaw Clinic filed an amicus brief (pdf) in the United States Supreme Court in the case, Georgia, et. al v. Public.Resource.Org Inc, No. 18-1150. The Clinic filed the brief on behalf of the Caselaw Access Project (CAP), a team of legal researchers, software developers, and law librarians based in the Harvard Law Library. The Clinic’s brief advocates for upholding the Eleventh Circuit’s holding in favor of the respondent, Public.Resource.Org (PRO), arguing for an easy, universal, and unrestricted access to the law. The case raises one major copyright concern: does the “government edicts doctrine” extend to—and therefore render uncopyrightable—materials that lack the force of, but are published alongside, and sometimes even inextricably mixed with, the law?…”