Going Digital

Abstract:  For all the glorious past it once had, copyright has been on the defensive for quite some time now. Of late years, the tide has turned the other way. Copyleft, a neologism invented by the computer programmer Don Hopkins, says it all: ‘Copyleft, all rights reversed!’ The shift from inks on paper to pixels on the screen as the dominant media of our time struck a deadening blow on copyright. The technological transition to a globe-spanning network of connected computers was reckoned a revolution, an ‘access revolution’. At least, that was the expression used by the American philosopher and leading voice of the Open Access movement, Peter Suber. The movement seeks to provide academic consumers with access to an online literature that is free of charge and free of most copyright restrictions. But he is not alone in his campaign. Another case in point is the Creative Commons. Following in the footsteps of the Open Source and Free Software movements, two offshoots of the Copyriots phenomenon, Creative Commons was founded in 2001 as an alternative to standard copyrights. Its main goal is to provide opener terms for the online sharing of creative works. The main difference is that Creative Commons licences permit gradations of copyright protection; the authors are allowed to choose which rights they want to retain and which rights they are willing to waive in order to achieve a wider public. It is a legal device that uses technology to protect not the author but the public domain, now an institution in its own right.


GLAM Wiki 2023/Program/Towards a Recommendation on Open Culture – What, When, How? – Meta

“Creative Commons’ Towards a Recommendation on Open Culture (TAROC) is a community initiative aiming to develop an international policy framework to recognize the importance of and support global open sharing of culture. As we build this initiative, we are seeking community engagement from GLAM professionals.

In September last year, UNESCO declared culture a global public good at Mondiacult 2022. With the successes of the 2019 UNESCO Recommendation on Open Educational Resources and 2021 Recommendation on Open Science, the world looks to UNESCO’s leadership to create the necessary international framework that would unlock the possibilities of equitable, ethical, and respectful sharing of cultural heritage in the digital age: a UNESCO Recommendation on Open Culture.

In this session, we will present our work around the initiative thus far and what we plan next. We will also hold space to hear from participants in Latin America about how open culture experiences in the region could help inform this international initiative….”

Open Access Best Practices and Licensing – Sridhar Gutam

“Within scholarly communication, open licensing plays a pivotal role in making work openly accessible while preserving rights and control. Open licenses facilitate dissemination, collaboration, and knowledge exchange by offering clarity and reducing access barriers. They promote transparency and can be applied to various research outputs, seamlessly aligning with OA principles. Open licensing extends permissions beyond default copyright law, granting creators the ability to define how others can access, engage with, share, and build upon their work. Creative Commons licenses exemplify this approach….”

The Future of Openness: IFLA at the Creative Commons Global Summit and what it means for libraries – IFLA

“Towards a Recommendation on Open Culture (TAROC)

Creative Commons is leading an initiative that will hopefully culminate in the establishment of an international Recommendation on Open Culture, a companion to UNESCO’s recommendations on Open Education Resources and Open Science.

TAROC seeks to advance the declaration of cultural ministers at the 2022 UNESCO World Conference on Cultural Policies and Sustainable Development (Mondiacult), in which culture was recognised as a global public good.

Global public goods are by definition non-exclusive and not bound to economic gain. While the parameters of culture as a global public good are still the subject of much discussion, cultural institutions can act in the spirit of fostering a cultural commons by enabling their collections to be openly accessible, shareable, and reusable.

This initiative opens up a wealth of opportunities to advocate for the role of libraries and their collections in supporting the development of thriving knowledge societies. It also links cultural policy with information and digital rights policies – a place where the library field is well suited to act as key stakeholders….”

Chan Zuckerberg Initiative Funds New Project to Openly License Life Sciences Preprints – Creative CommonsCreative Commons

“Today, Creative Commons (CC) is excited to announce new programmatic support from the Chan Zuckerberg Initiative (CZI) to help make openly licensed preprints the primary vehicle of scientific dissemination….

The eighteen-month grant will enable CC to collaborate with CZI on a project focused on significantly increasing use of the CC BY 4.0 license on preprints in the life sciences by working with funders, preprint servers, and other preprint stakeholders….”

A look at CC’s Open Culture Roundtable in Lisbon

“Group photo at CC’s Open Culture Roundtable in Lisbon” licensed CC BY 4.0

As part of our Open Culture Program, we at Creative Commons (CC) are exploring avenues to build momentum towards a UNESCO Recommendation on Open Culture. On 11 May, 2023, we hosted our first in-person Open Culture event, in Lisbon, Portugal. In this blog post, we look back at the day’s highlights and map out next steps.


Over the past decade, the open movement has made incredible strides in the cultural sector — take a look at some of the pioneers — yet it is still facing major barriers and challenges. But challenges are opportunities in disguise. In September last year, UNESCO declared culture a global public good at Mondiacult 2022. With the successes of the 2019 UNESCO Recommendation on Open Educational Resources and 2021 Recommendation on Open Science, the world looks to UNESCO’s leadership to create the necessary international framework that would unlock the possibilities of equitable, ethical, and respectful sharing of cultural heritage in the digital age: a UNESCO Recommendation on Open Culture. For an explanation of useful terms related to open culture, take a look at the glossary developed by the CC open culture platform.

Meeting highlights

Recognizing that such an international instrument requires deliberative, inclusive community consultations, the in-person event focused on the foundational work of gathering community input. Structured around a co-created agenda and under the able guidance of Abdul Dube and Mona Ebdrup, facilitators at Visual Confidence, just under 40 experts gathered to exchange views and open initial discussions on the need to realize open culture as a global public good. 

Participants came from far and wide across the open movement and beyond, spanning the fields of law, library science, policy, design, anthropology, history, museum curation, international organizations, and many others. Attending from CC’s team were Brigitte Vézina, Director of Policy and Open Culture; Connor Benedict, Open Culture Coordinator; Jennryn Wetzler, Director of Learning and Training; and Jocelyn Miyara, Open Culture Manager. 

During convivial, engaged, polyphonous and cross-pollinating conversations, we exchanged our diverse perspectives; explored potential common grounds on backgrounds and contexts, core issues, and key principles; built a common understanding of what we collectively want to achieve; and elaborated a skeleton of a shared vision for “open culture.” Issues discussed included the role of copyright over access to cultural heritage, the impact of artificial intelligence, the “platformization” of culture, a sense of a generational shift in the open movement, the need to account for ethical sharing, the economics of open culture, open beyond “GLAMs” (galleries, libraries, archives and museums), the need for diversity and inclusivity in global and local contexts (including traditional knowledge and Indigenous rights), a vision for open culture in 100 years, and a lot more!

Take a look at the meeting’s graphic record, offering a visual summary of the diverse perspectives that felt most resonant within our breakout groups and that surfaced in plenary debriefs.

A hand drawn flowchart depicts the agenda for the day with “Snap Shot: Visual Agenda” in a speech bubble at the top. Beginning at the top with a smiling face coming through a doorway; two stick figures greeting each other, and arrows pointing to our three movements as described in the caption. All of this leads to a globe encircled by arrows and an opening question “what are you here to help achieve?”Our work together was organized in three flows or movements. Flow #1: Mapping our Collective Knowledge, Flow #2: Context Mapping, and Flow #3: Bold Steps. These movements were designed to help us gather our collective knowledge, and hold multiple perspectives and truths at the same time. © Creative Commons / Abdul Dube and Mona Ebdrup, CC BY 4.0This graphic recording is a hand-drawn representation of our conversations. With a bold black line framing the box, there is a speech bubble around “#01 Flow” next to the words “History of Open Culture”. In the center of the diagram is a globe with Africa centered. Themes with doodles orbit the globe. Clockwise from the top - “focus on western references and achievements”, “Pessimism or optimism” with a question mark, “new opportunities emerging in a quickly changing context” with an arrow pointing away from the globe, “AI comes with risks and benefits” with two lap tops chatting with each other, “different places but together”, “Missing global south perspectives” with speech bubbles, one colored orange, “negative actors: lack of clarity and infrastructure…risk of being illegal when working within open culture” with a skull and crossbones saying “who owns what?”, “History can mean many things” with a timeline, and “focus on ownership and access”.In our first flow, participants were divided into five groups to discuss the History of Open Culture, and then came back together to identify key themes. This is a visual representation of some themes that were discussed, including ownership, western references and achievements, emerging opportunities in a quickly changing context, missing global south perspectives and the emergence of AI with its risks and benefits. © Creative Commons / Abdul Dube and Mona Ebdrup, CC BY 4.0This graphic recording is a hand-drawn representation of our conversations. With a bold black line framing the box, there is a speech bubble around “#02 Flow” next to the words “Context Mapping”. In the center of the diagram is a piece of paper with the words “open culture” and a globe with arrows pointing to it. Around the piece of paper are the themes discussed with doodles to accompany them. Themes include “What’s the role of open culture?” with an arrow to “clear goals”, “who uses OPEN”, “blurriness around legal safety”, “AI concerns - what/how/where/when”, “Need to expand memory institutions”, “emergent technologies”, “activism: direct, hacker, subvert platformization”, “Digital barbershops”, “what is info literacy?”, “build bridges” written in the shape of a bridge, “How do we accommodate all the different needs” with a big drawn question mark, “internet know how”, “false!! Universality”, “NB boundaries for the greater goods”, “bad actors”, “license trolls”, “the traps of open: ethical concerns to fuzzy times..” with an open eyeball, “more access but gated” with a drawn wall with a gate, “identifying gaps” with arrows to “technical knowledge, generational, stakeholder needs”, “the rise of conservatism and its affects on culture”, “what are we supposed to do” with arrows to “choosing battles, allies in other forms of monopoly, ethics of open culture needs an update”. In Flow #2, groups met again to discuss the context around open culture, including the political climate, internal and outside trends, economic climate, tech factors, stakeholder needs and uncertainties. A few major themes were the desire to build bridges, the need to accommodate diverse needs of diverse stakeholders, the desire to identify what our shared goals are and better define the role of open culture. © Creative Commons / Abdul Dube and Mona Ebdrup, CC BY 4.0This graphic recording is inside a bold black framed box, with a speech bubble at the top “#03 Flow” surrounded by “Bold steps…towards open culture”. Underneath is a sun rising on the horizon with a road narrowing towards it. Surrounding this are all some of the major themes with doodles accompanying them. “What are the opportunities for cultural change”, “bold steps are also known steps'' with a stick figure going up a staircase, “Assumption that we have shared values and sense of urgency”, “we can be more ambitious” with a stick figure atop an exclamation mark holding a sign that says “copyrights”, “strong opinions on platforms” surrounding a stack of flat rectangles, “meeting with more diverse stakeholders”, “UNESCO global fair use recommendation for culture; ethics open to local interpretation”, “TAX the RICH: taxation that supports cultural creation”, “need of different voices still…” with a heart drawing, “need for economic resources” with a big exclamation mark, “long term intentionality: centuries vs. decades vs. 100 year strategies”, on a theater stage “power of cultural literacy” on top and “space for telling stories” on the stage, a laptop with the words “investment for building different infrastructure” on its screen, “we don't’ really know why we really want open culture”, a venn diagram with private and public in two circles and “division between” below - “can we build sustainable and reliable relationships?”, a three-part venn diagram encircled by the worlds “partnerships with private sector” surrounding it and “can there be detachment from commercial interests” below. “Commercial platforms role in cultural creation” with a stick figure standing on an exclamation mark with an open eye above, “limited liability for cultural heritage institutions/public interest institutions”. In our third and final flow, we discussed “Bold Steps toward Open Culture”. By focusing on our values, supports, and challenges, we discussed what bold steps we might make toward our shared vision of the future of Open Culture. Major themes included discussions around partnerships between public and private actors; division between public and private interests and the need to build sustainable and reliable relationships, need for economic resources to support open, and the desire to meet with more diverse stakeholders. © Creative Commons / Abdul Dube and Mona Ebdrup, CC BY 4.0

Participants appreciated the opportunity to meet peers and build new relationships, and got a sense of the possibilities of going down a common path together. While in-person meetings such as this cannot include all of the perspectives needed, participants noted the value of in-person discussions to probe various approaches to open culture deeply. We aim to offer additional avenues to include more perspectives in follow-up activities.

Here’s what some of the participants shared about their experience: 

“The CC Open Culture Roundtable was an opportunity to meet and engage with open culture experts and advocates around the world and see how, despite the many contextual differences, there are meaningful ways for us to collaborate and shape nuanced, context-mindful perspectives for projects and policies aiming at a shared and open culture.”

Mariana Valente, Assistant Professor in law, University of St. Gallen, Switzerland and Associate Director, InternetLab (Brazil).


“It was a great opportunity to hold first discussions about the initiative, and it allowed me to reflect on possible options further.” 

Gašper Hrastelj, Secretary General, Slovenian National Commission for UNESCO


“The CC Open Culture Roundtable was a perfect opportunity to meet in person to discuss Open Culture, and it allowed me to enlarge my view and learn other perspectives.”

Deborah de Angelis, Chapter Lead, Creative Commons Italy


The CC Open Culture Roundtable was a great opportunity to meet people from diverse organizations and parts of the world, and it allowed me to see different perspectives on IP and ‘openness’ as a concept and movement.”

Matt Voigts, Copyright and Open Access Policy Officer, IFLA


“It was an excellent opportunity to bring different open culture stakeholders together and reignite and expand important discussions among them. And it allowed me to reflect on the possibilities in my reach to contribute more effectively to the progress of open culture, both locally and globally.”

Fátima São Simão, Chapter Lead, Creative Commons Portugal


“It was an inspiring opportunity to share ideas of the open culture and notice that there are a lot of people trying to solve similar questions from different angles, and it allowed me to meet many new and interesting people and to enjoy working together.”

Johanna Lilja, Director of Services, National Library of Finland, and IFLA Cultural Heritage Advisory Board


“For me, it was an opportunity to do a historical reflection exercise where we were able to look at how we have grown as a movement. And it allowed me to collaborate in the construction of a more or less common concept or idea of what is understood in different corners of the world as “open culture”. It also allowed me to connect with people who are doing amazing projects.”

Ivan Martinez, Coordinator, Creative Commons Mexico


“The CC Open Culture Roundtable was a first step on a exploratory journey on how GLAMs could be better supported through open approaches to public domain material. It allowed me to understand the diversity of stakeholders’ perspectives on the issue.”

Lutz Möller, Deputy Secretary-General, German National Commission for UNESCO


“The CC Open Culture Roundtable was a pitstop for ongoing discussions around the importance of open culture, and it allowed me to reconnect to the wider international community.”

Maarten Zeinstra, Owner, IP Squared and Member, Creative Commons Netherlands


“It was firstly a chance to meet people who are actively involved in the movement, particularly from different contexts, it allowed me to better see somewhat paradoxically the boundaries of open culture, and have the space to start to think about what openness means for knowledges outside of the legal frameworks of IP.”

Abira Hussein, Advisor, Whose (Digital) Archives? and Lab Partner, GLAM-E Lab


“The CC Open Culture Roundtable was a warm gathering of fellow travelers and it allowed us to imagine new ways to act together.”

Fiona Romeo, Senior Manager, Culture and Heritage, Wikimedia Foundation 

Next steps

We are excited to take the outcomes of our Lisbon event forward. We are already planning to continue the conversation at the CC Summit in Mexico in October, and hopefully at GLAM Wiki 2023 in Montevideo, Uruguay in November this year. We will also organize multiple virtual opportunities to contribute as we engage more community members in our work on open culture.

Interested in knowing more about CC’s work in the field of open culture? Join our open culture platform or write to us at info@creativecommons.org

The post A look at CC’s Open Culture Roundtable in Lisbon appeared first on Creative Commons.

Data Rivers: Carving Out the Public Domain in the Age of Generative AI by Sylvie Delacroix :: SSRN

Abstract:  The salient question, today, is not whether ‘copyright law [will] allow robots to learn’. The pressing question is whether the fragile data ecosystem that makes generative AI possible can be re-balanced through intervention that is timely enough. The threats to this ecosystem come from multiple fronts. They are comparable in kind to the threats currently affecting ‘water rivers’ across the globe.

First, just as the fundamental human right to water is only possible if ‘reasonable use’ and reciprocity constraints are imposed on the economic exploitation of rivers, so is the fundamental right to access culture, learn and build upon it. It is that right -and the moral aspirations underlying it- that has led millions to share their creative works under ‘open’ licenses. Generative AI tools would not have been possible without access to that rich, high-quality content. Yet few of those tools respect the reciprocity expectations without which the Creative Commons and Open-Source movements cease to be sustainable. The absence of internationally coordinated standards to systematically identify AI-generated content also threatens our ‘data rivers’ with irreversible pollution.

Second, the process that has allowed large corporations to seize control of data and its definition as an asset subject to property rights has effectively enabled the construction of hard structures -canals or dams- that has led to the rights of many of those lying up-or downstream of such structures to be ignored. While data protection laws seek to address those power imbalances by granting ‘personal’ data rights, the exercise of those rights remains demanding, just as it is challenging for artists to defend their IP rights in the face of AI-generated works that threaten them with redundancy.

To tackle the above threats, the long overdue reform of copyright can only be part of the required intervention. Equally important is the construction of bottom-up empowerment infrastructure that gives long term agency to those wishing to share their data and/or creative works. This infrastructure would also play a central role in reviving much-needed democratic engagement. Data not only carries traces of our past. It is also a powerful tool to envisage different futures. There is no doubt that tools such as GPT4 will change us. We would be fools to believe we may leverage those tools at the service of a variety of futures by merely imposing sets of ‘post-hoc’ regulatory constraints.

Generative AI Meets Open Culture Tickets, Tue, May 2, 2023 at 10:00 AM | Eventbrite

“With the rise of generative artificial intelligence (AI), there has been increasing interest in how AI can be used in the description, preservation and dissemination of cultural heritage. While AI promises immense benefits, it also raises important ethical considerations.

In this session, leaders from Internet Archive, Creative Commons, and Wikimedia Foundation will discuss how public interest values can shape the development and deployment of AI in cultural heritage, including how to ensure that AI reflects diverse perspectives, promotes cultural understanding, and respects ethical principles such as privacy and consent.

Join us for a thought-provoking discussion on the future of AI in cultural heritage, and learn how we can work together to create a more equitable and responsible future.”

Networking the commons: creative commons project creators funding patterns in crowdfunding | Emerald Insight

Abstract:  Purpose

Guided by the collective action theory, signaling theory and social identity approach, this study examines backing behavior by individuals who have created projects under CC licenses. Two motivational mechanisms were examined: (1) identification via common interests in the CC space; (2) resource signaling by other users via their diverse project creation experience, funding or commenting activity.


Data were collected from Kickstarter.com. Exponential random graph modeling was used to examine how the two reviewed mechanisms influence the tie formation probability between Creative Commons (CC) project creators and other creators. The analysis was conducted on two subnetworks: one with ties between CC creators; and one with ties from CC creators to non-CC creators.


The study found that CC creators exhibit distinct backing patterns when considering funding other CC creators compared to non-CC users. When considering funding their peer CC creators, CC identity can help them allocate and support perceived in-group members; when considering funding non-CC creators, shared common interests in competitive project categories potentially triggers a competition mindset and makes them hold back when they see potential rivals.


This study makes three contributions. First, it draws from multiple theoretical frameworks to investigate unique motivations when crowdfunders take on dual roles of creators and funders and offered implications on how to manage competition and collaboration simultaneously. Second, with network analysis our study not only identifies multiple motivators at work for collective action, but also demonstrates their differential effects in crowdfunding. Third, the integration of multiple theoretical frameworks allows opportunities for theory building.

Twenty years of Creative Commons licences: key legal considerations and best practice

“Creative Commons, the US-based non-profit organisation which has developed a scheme for freely licensing copyright, recently celebrated its twentieth year. While Creative Commons is not the only scheme which facilitates open content licensing, it is by far the most commonplace today. In this article, Owen O’Rorke and Ethan Ezra set out some “best practice” points when encountering these licences and examine the history, strengths, and critiques of the scheme….”

Fair Use: Training Generative AI

Like the rest of the world, CC has been watching generative AI and trying to understand the many complex issues raised by these amazing new tools. We are especially focused on the intersection of copyright law and generative AI. How can CC’s strategy for better sharing support the development of this technology while also respecting the work of human creators? How can we ensure AI operates in a better internet for everyone? We are exploring these issues in a series of blog posts by the CC team and invited guests that look at concerns related to AI inputs (training data), AI outputs (works created by AI tools), and the ways that people use AI. Read our overview on generative AI or see all our posts on AI.

Generated by AI: An oil painting in the style of Pieter Jansz Saenredam of a robot learning to follow a recipe in a Dutch kitchen with a large collection of tiny artworks arranged haphazardly on shelves.“Robot Training” by Creative Commons was generated by the DALL-E 2 AI platform with the text prompt “an oil painting in the style of Pieter Jansz Saenredam of a robot learning to follow a recipe in a Dutch kitchen with a large collection of tiny artworks arranged haphazardly on shelves.” CC dedicates any rights it holds to the image to the public domain via CC0.

While generative AI as a tool for artistic expression isn’t truly new — AI has been used to create art since at least the 1970s and the art auction house Christie’s sold its first piece of AI artwork in 2018 — the past year launched this exciting and disruptive technology into public awareness.  With incredible speed, the development and widespread availability of amazing tools like Stable Diffusion and Midjourney have engendered excitement, debate, and indeed fear over what the future may hold and what role generative AI should have in the production of creative works.

Perhaps unsurprisingly to anyone who has been paying attention to the conversation around generative AI, the past year also saw the first lawsuits challenging the legality of these tools. First, in November, a group of programmers sued Github and OpenAI over the code generation tool, Github Copilot, alleging (among other things) that the tool improperly removes copyright management information from the code in its training data, in violation of the Digital Millennium Copyright Act, and reproduces code in its training data without following license agreement stipulations like attributing the code to its original author. Then, in January, a group of artists (represented by the same attorneys as in the Github lawsuit) sued Stability AI and Midjourney over their text-to-image art generation tools. In this second lawsuit, the artist-plaintiffs made several claims, all of which deserve discussion. In this blog post, I will address one of those claims: That using the plaintiffs’ copyrighted works (and as many as 5 billion other works) to train Stable Diffusion and Midjourney constitutes copyright infringement. As Creative Commons has argued elsewhere, and others agree, I believe that this type of use should be protected by copyright’s fair use doctrine. In this blog post I will discuss what fair use is, what purpose it serves, and why I believe that using copyrighted works to train generative AI models should be permitted under this law. I will address the other claims in both the Github and Stable Diffusion lawsuits in subsequent blog posts.

Copyright for public good

It is clear from both the history and origin of copyright law in the United States that copyright’s purpose is to serve the public good. We can see this in the Constitution itself. Article I, section 8, clause 8 of the U.S. Constitution gives Congress the power to create copyright law. This provision states that copyright law must “promote the Progress of Science and useful Arts” and that copyright protection can only last for “limited times.” As such, any copyright law that Congress passes must be designed to support the creation of new creation works and that copyrights must eventually expire so that the collection of works that are free for us all to use — the public domain — will grow and nurture further creative endeavors. However, even while the ultimate beneficiary of copyright may be the public, the law attempts to achieve these goals by giving rightsholders several specific ways to control their works, including the right to control the reproduction and distribution of copies of their works.

With this design, copyright law attempts to strike a balance between the interests of both rightsholders and the public, and when that balance breaks down, copyright cannot achieve its goals. This is where fair use comes from. Shortly after the first copyright law, courts began to realize that it would frustrate copyright’s ability to benefit the public if rightsholders had an unlimited right to control the reproduction and distribution of their works. So, in 1841, Judge Joseph Story first articulated what would become eventually the modern test for fair use in Folsom v. Marsh. As part of that decision, he wrote that downstream uses of copyrighted works that do not “supersede the objects” of the original works should be permitted under the law.

Generated by AI: A chrome-skinned robot face with a blue glow behind its eyes and nose, looking out from the inside of a complex black and white machine.“Fair Use Training Generative AI” by Creative Commons was generated by the Stable Diffusion AI platform with the text prompt “Fair Use Training Generative AI.” CC dedicates any rights it holds to the image to the public domain via CC0.

What is fair use?

Today, fair use, codified at 17 USC 107, is unquestionably an essential part of copyright law in the United States. Courts, including the Supreme Court, have repeatedly emphasized the importance of fair use as a safeguard against the encroachment on the rights of people to use copyrighted works in ways that rightsholders might block. Unfortunately, however, fair use is a famously hard doctrine to apply. Courts repeatedly write that there are no bright lines in what is or is not fair use and each time we consider fair use we must conduct a case-by-case analysis. To that end, the law requires courts to consider four different factors, in light of the purpose and goal of copyright law. These factors are: 1. The purpose and character of the use, or what the user is doing with the original work; 2. The nature of the original work; 3. The amount and substantiality copied by the secondary use; and 4. Whether the secondary use harms the market for the original.

Even though there are no bright lines, there are some principles we can look to when weighing the four fair use factors that courts tend to consider for a finding of fair use and that are particularly relevant for how we may think about fair use and generative AI training data. First, and perhaps most importantly, is whether the secondary use “transforms” the original in some way, or if it “merely supersede[s]” the original. Since 1994, when the Supreme Court adopted “transformativeness” as part of the inquiry about the purpose and character of the secondary use in Campbell v. Acuff Rose Music, this question has grown increasingly important. Today, if someone can show that their secondary use transforms the original in some way, it is much more likely to be fair use then otherwise. Importantly, however, last October, the Supreme Court heard Andy Warhol Foundation v. Goldsmith, which may change how we approach transformativeness in fair use under U.S. law. Nevertheless, it still seems likely that highly transformative uses will weigh in favor of fair use, even after the decision in that case. Second, when considering the nature of the original work, we need to remember that copyright protects some works a bit more strongly than others. Works that are fiction or entirely the creative products of their authors are protected more strongly than nonfiction works because copyright does not protect facts or ideas. As such, uses of some works are less likely to be fair use than others. Third, we need to think about how much of the original work is copied in the context of the transformativeness inquiry, and whether the amount copied serves the transformative purpose. If the amount copied fits and supports the transformative purpose, then fair use can support copying entire works. Fourth, when we consider market harm, we need to think about whether the secondary use undermines the market for or acts as a market substitute for the original work. And finally, we need to consider whether permitting a secondary use as a fair use would serve the goals of copyright.

Is AI transformative?

Given all this background on fair use, how do we apply these principles to the use of copyrighted works as AI training data, such as in the Stable Diffusion/Midjourney case? To answer this question, we must first look at the facts of the case. Dr Andrés Guadamuz has a couple excellent blog posts that explain the technology involved in this case and that begin to explain why this should constitute fair use. Stability AI used a dataset called LAION to train Stable Diffusion, but this dataset does not actually contain images. Instead, it contains over 5 billion weblinks to image-text pairs. Diffusion models like Stable Diffusion and Midjourney take  these inputs, add “noise” to them, corrupting them, and then train neural networks to remove the corruption. The models then use another tool, called CLIP, to understand the relationship between the text and the associated images. Finally, they use what are called “latent spaces” to cluster together similar data. With these latent spaces, the models contain representations of what images are supposed to look like, based on the training data, and not copies of the images in their training data. Then, user focused applications collect text prompts from users to generate new images based on the training data, the language model, and the latent space.

Turning back to fair use, this method of using image-text combinations to train the AI model has an inherently transformative purpose from the original images and should support a finding of fair use. While these images were originally created for their aesthetic value, their purpose for the AI model is only as data. For the AI, these image-text pairs are only representations of how text and images relate. What the images are does not matter for the model — they are only data to teach the model about statistical relationships between elements of the images and not pieces of art.

This is similar to how Google used digital copies of print books to create Google Books, a practice that was challenged in Author’s Guild v. Google (Google Books). In this case, the Second Circuit Court of Appeals found that Google’s act of digitizing and storing copies of thousands of print books to create a text searchable database was fair use. The court wrote that Google’s purpose was different from the purpose of the original authors because Google was not using the books for their content. Indeed, the content did not really matter to Google; rather the books were like pieces of data that were necessary to build Google’s book database. Instead of using the books for their content, Google’s purpose was to create a digital tool that permitted new ways of using print books that would be impossible in the analog world. So, the books as part of Google’s database served a very different purpose from their original purpose, which supported the finding of fair use in this case.

Moreover, it is also similar to how search engine operator Arriba Soft used copies of images in its search engine, which was litigated in Kelly v. Arriba Soft. In this case, a photographer, Leslie Kelly, sued the operator of a search engine, Arriba Soft, for copying and displaying copies of her photographs as thumbnails to users. The court, however, disagreed that this constituted copyright infringement. Instead, the court held that this use served a different and transformative purpose from the original purpose because Arriba Soft only copied Kelly’s photographs to enable its search engine to function and not because of their aesthetic value. Like Google Books, and like AI training data, the images here served a function as data for the tool, not as works of art to be enjoyed as such.

On the nature of works as AI inputs

Turning to factor two, the nature of the original works, even though we do not know what specific images are in the LAION dataset used to train Stable Diffusion and Midjourney, it is likely that these images involve a wide range of creativity. While this could weigh against a finding of fair use for Stable Diffusion and Midjourney, given the presumably creative nature of the input works, this factor is rarely determinative. In fact, in Google Books, the court was skeptical that this factor would weigh against fair use even if the books in the database were fiction. This is because using the digitized books as part of the database provided information about the books and did not use them for their creative content.  Similarly in the litigation against Stable Diffusion and Midjourney, these generative AI tools use the works in their dataset as data. In this, anything they extract from their training data might only be unprotectable elements of the works in the training data, such as facts and ideas about how various concepts are visualized.  As such, because this factor is rarely a major factor in fair use decisions, it seems unlikely that this factor should weigh heavily against fair use in this case.

Is AI making copies?

Third, because of how the generative AI models work, they use no more of the original works than is necessary when used for training to enable the transformative purpose. The models do not store copies of the works in their datasets and they do not create collages from the images in its training data. Instead, they use the images only as long as they must for training. These are merely transitory copies that serve a transformative purpose as training data. Again, Google Books is helpful to understand this. In that decision, the Court wrote that Google needed to both copy and retain copies of entire books for its database to function. But this was permissible because of Google’s transformative purpose. Furthermore, Google did not permit users to access full copies of the books in the database, but instead, it only revealed “snippets” to the users. On this point, the court wrote that the better question to answer was not how much of the works Google copied, but instead how much was available to users. Similarly, Stability AI and Midjourney would not work unless they used the entire images in their training datasets. Moreover, they do not store images, they do not reproduce images in their data sets, and they do not piece together new images from bits of images from their training data. Instead, they learn what images represent and create new images based on what they learn about the associations of text and images.

AI in the marketplace

Fourth, the issue of whether Stable Diffusion and Midjourney harm the market for the works in their training data is difficult, in part because the way that courts think of this question can be a bit inconsistent. In one way, the answer must be yes, this use at least has the potential to harm the market for the original. That is, after all, one likely reason the plaintiffs filed this lawsuit in the first place — they are afraid that AI generated content will cut into their ability to profit off of their art. Indeed, any art has the potential of competing with other art, not necessarily because it fills the same niche, but because attention is limited, and AI generated content has the advantage of being able to be made in a quick, automated fashion. However, this may not be the best way to think about market harm in the context of using images as training data. As mentioned above, we need to think about this question in the context of the transformative purpose. In Campbell v. Acuff Rose, the Supreme Court wrote that the more transformative the purpose, the less likely it is that it will be a market substitute for the original. Given this, perhaps it is better to ask whether this use as training data, not as pieces of art, harms the market for the original. This use by Stability AI and Midjourney exists in an entirely different market from the original works. It does not usurp the market of the original and it does not operate as a market substitute because the original works were not in the data market. Moreover, this use as training data does not “supersede the objects” of the originals and does not compete in the aesthetic market with the originals.

Training AI as fair use

Finally, as discussed above, since the purpose of copyright law is to encourage the new creative works, to promote learning, and to benefit the public interest, fair use should permit using copyrighted works as training data for generative AI models like Stable Diffusion and Midjourney. The law should support and foster the development of new technologies that can provide benefits to the public, and fair use provides a safeguard against the cudgel of copyright being used to impede these technologies. As Mark Lemley and Bryan Casey write in a recent paper arguing that this type of use should constitute fair use:  “A central problem with allowing copyright suits against ML [machine learning] is that the value and benefit of the system’s use is generally unrelated to the purpose of copyright.” In fact, the Supreme Court has recognized fair use’s importance in the development of new technologies, first in 1984, in Universal City Studios v. Sony and most recently in 2021 in Google v. Oracle. In Sony, the Court held that the Betamax videocassette recorder should not be sued out of existence even if it could potentially help people violate copyright law. Instead, because it held “substantial, non infringing uses”, the Court believed copyright law should not be used to stop it. Then in Google, the Court held that Google’s use of Google’s 11,500 lines of Java code was fair use, writing that the courts must consider fair use in the context of technological development.

Altogether, I believe that this type of use for learning purposes, even at scale by AI, constitutes fair use, and that there are ways outside of litigation that can offer authors other ways to control the use of their works in datasets. We can already see an example of this, to a degree, when Stability AI announced that it would permit artists to opt out of having their works used for training Stable Diffusion. While this certainly isn’t a perfect solution, and opt-out is just one possible way to approach these issues, it is at least a start, and it highlights that there are ways to address these problems other than through copyright-based solutions. Perhaps by looking at norms and best practices and by engaging people in collaboration and dialogue we can better address the concerns raised by AI training data, instead of falling back on lawsuits that force the different sides of this issue into opposition and that can create unpredictable and potentially dangerous new precedent for future technologies.

The post Fair Use: Training Generative AI appeared first on Creative Commons.

Revisiting the Openverse: Finding Open Images and Audio

Blurry bluish-black image of stars or lights at night seen through a transparent screen marked with smeared human handprints.art is the universe creating itself as it goes” by submerged~, here slightly cropped, is marked with Public Domain Mark 1.0 .

Looking for that perfect picture to illustrate your post? That catchy tune to jazz up your video? Look no further than Openverse, the huge library of free and open stock photos, images, and audio contributed to the public commons by people around the world, now available at its new domain: openverse.org.

Here at CC we use Openverse daily to explore the public commons and find works to reuse in our communications and projects. Powerful tools like Openverse demonstrate how open technologies and communities like WordPress can build on the rich public commons we all help create to support what we call better sharing: sharing that is inclusive, just and equitable — where everyone has wide opportunity to access content, to contribute their own creativity, and to receive recognition and rewards for their contributions.

Finding and using free and open works has never been easier: Just visit Openverse, enter some keywords, and pick your favorite from the results. You can also filter by content type, sources, aspect ratio, size, open license and public domain statuses, and more, like the search for the keywords “art” and “universe” we used to find the image in this post.

Once you’ve picked a work, Openverse provides everything you need to use it: Visit the work in its home collection and copy a well-formed attribution statement to give proper credit for your use.

Openverse was incubated here at CC as “CC Search”, moving to the WordPress community in 2021, and has continued to thrive in its new home, now cataloging over 600 million images and audio tracks, with new collections of open works being added all the time, like the recent addition of more than 15 million images from iNaturalist, the project that enables citizen scientists and researchers to document and understand global biodiversity.

Contributors in the WordPress community continue to add new features and capabilities to Openverse. Coming up next will be new tools to easily use images from Openverse directly in WordPress itself; content safety features that will enable users to blur or opt in/out from specific types of sensitive content; and improvements to search relevancy and the quality of results.

Can you help expand the Openverse?

As a creator, share your work to the commons with a CC open license or CC0 dedication to the public domain on one of sources already cataloged in Openverse.

Do you know a great collection of open works? Suggest a new source for Openverse.

Do you have communication and/or technical skills? Join the Openverse contributor team and help with things like testing new features, writing documentation, contributing code, and amplifying news from the project. Have a look at Openverse’s good first issues or their guide for new contributors.

The post Revisiting the Openverse: Finding Open Images and Audio appeared first on Creative Commons.

EU list of specific high-value datasets and the arrangements for their publication and re-use

“(12) It is the objective of Directive (EU) 2019/1024 to promote the use of standard public licences available online for re-using public sector information. The Commission’s Guidelines on recommended standard licences, datasets and charging for the re-use of documents (5) identify Creative Commons (‘CC’) licences as an example of recommended standard public licences. CC licences are developed by a non-profit organisation and have become a leading licensing solution for public sector information, research results and cultural domain material across the world. It is therefore necessary to refer in this Implementing Regulation to the most recent version of the CC licence suite, namely CC 4.0. A licence equivalent to the CC licence suite may include additional arrangements, such as the obligation on the re-user to include updates provided by the data holder and to specify when the data were last updated, as long as they do not restrict the possibilities for re-using the data….”

OASPA Open Access License Types

“The charts show numbers of articles published in fully OA journals (left), and OA articles in hybrid journals (right), color-coded by license type. The most permissive licenses are at the bottom (CC BY), through to least permissive at the top, except for the tiny amount of CC0.

The volume of publications from OASPA continues to grow. Just under 4M articles were published by members in the period 2000-2021.
Just under 1M of the cumulative total were published in 2021, representing a growth of around 46% over the previous year and around one quarter of total recorded output.
The total number of articles reported by members has more than doubled since 2018, and grown around 20x over the last decade.
Publications in fully OA journals continue to dominate output, at around 4x that in hybrid.
CC BY licenses (Creative Commons attribution only) dominate. They account for almost three quarters of members’ total output, and for 81% of their output in fully OA journals….”