Skip to main content

Copyright, AI and Unlocking Scholarly Publishing with Casey Fiesler

Melissa Cantrell

Melissa Cantrell, associate professor,
scholarly communication librarian,
Center for Research Data & Digital Scholarship

As artificial intelligence tools like ChatGPT and Midjourney become increasingly integrated into academic research and writing, questions around copyright, authorship, and ethical use are gaining urgency. In this installment of “Unlocking Scholarly Publishing,” Melissa Cantrell, Scholarly Communication Librarian at the University Libraries, spoke with Associate Professor Casey Fiesler from the College of Communication, Media, Design and Information. A leading scholar in technology ethics, internet law and online communities, Fiesler brings a nuanced perspective to the evolving intersection of AI and copyright.

Please note that nothing in this edition of “Unlocking Scholarly Publishing” should be construed as legal advice.

Can you outline the main copyright concerns for authors using AI in their scholarship?

With respect to copyright specifically, I typically think of three major categories of questions related to generative AI: whether AI output is copyrightable, whether AI output could be copyright infringement (e.g., of work in the model’s training data), and whether the use of content to train AI systems in the first place was copyright infringement. If researchers are using a large language model (LLM) like ChatGPT in their writing, then they are probably most concerned with the first two. Assuming that a researcher is using an LLM to assist them and not to do their writing for them, then the legal risk may be low. However, as is the case with many concerns around AI right now, even if the legal risks are low there are ethical questions as well that are perhaps even more important, including accountability for errors, research integrity, appropriate attribution and disclosure.

I suspect that copyright is relatively low on the list of issues around generative AI that many researchers and academic authors are grappling with! It also isn’t surprising that we are hearing so much about copyright in the context of regulation however, since many of the broader concerns around generative AI (e.g., privacy, environmental impacts, disinformation, labor impacts) have fewer existing laws creating regulatory levers to pull.

 

 

If a researcher uses AI in the course of their research and writing, do they still have a copyright claim over the research outputs?

United States law requires that in order for a work to be copyrightable, it must have human authorship—which means that, in theory, an output wholly generated by a generative AI tool, such as text written by ChatGPT or an image created by Midjourney, cannot be copyrighted.

However, it seems likely that if a researcher is using an AI tool to assist them rather than having it do the work for them, then that researcher would be playing a sufficiently creative and controlling role such that they could hold a copyright in the human-authored components. Though it is worth noting that the Copyright Office has explicitly said that simply inputting a prompt does not qualify as human authorship!

Casey Fiesler

Associate Professor Casey Fiesler researches
and teaches in the areas of technology ethics,
internet law and policy, and online communities.

Another consequence of the human authorship requirement is that an AI tool itself cannot hold a copyright. Similarly, a number of scholarly organizations and publications have clarified that an AI tool cannot be listed as an author on published work (in part because a human must take responsibility for submitted work), and also require that authors must be transparent in disclosing that use.

​​Therefore, in addition to things like copyright status for published works, it is important that authors also familiarize themselves with the policies and norms for acceptable AI use for any given publication as well as their research community more broadly.

Unless an author transfers their copyright to another entity—such as a publication venue—they own that work and therefore have an exclusive right to make and distribute copies, except to the extent to which they provide a license to do so—again, typically to a publication venue.

Published papers and other research outputs have almost certainly been part of the training data for generative AI systems. However, AI companies are largely arguing that use of copyright content for training data constitutes fair use, an exception in U.S. copyright law. This question is a significant legal battleground right now and is far from settled. The recent District Court decision in Bartz v. Anthropic did offer some early support for the fair use argument in a narrow context, though it was only a partial win and Anthropic still agreed to a $1.5 billion class action settlement to authors and book publishers, which may include academic authors.

However, there are still a large number of lingering copyright cases, and fair use decisions tend to depend heavily on fact-specific elements such as how the content was acquired, how the model reproduces the content, and whether outputs cause market harm. Though of course a copyright holder always has the option to attempt to enforce their rights, it is likely that collective action by organizations and rightsholders such as publishers will have more impact in shaping the landscape. Researchers may want to join in advocacy for better transparency and data provenance practices in AI development so that there is a better understanding of when and how our work is used.

Given the unsettled landscape of copyright, AI, and scholarly publishing, are there any good resources for keeping up with case decisions, conversations, and trends around this topic?

For those interested in digging deeper into this topic, the U.S. Copyright Office has multiple reports that analyze the copyright law issues raised by AI, including copyrightability and use of training data.

The Authors Alliance maintains an FAQ and set of resources on the use generative AI.

I also have some videos and resources on AI and IP on my public-facing AI ethics syllabus.