Tuesday, November 5, 2024

The Heart of the Matter: Copyright, AI Training, and LLMs; SSRN, November 1, 2024

Daniel J. GervaisVanderbilt University - Law School

Noam ShemtovQueen Mary University of London, Centre for Commercial Law Studies

Haralambos MarmanisCopyright Clearance Center

Catherine Zaller RowlandCopyright Clearance Center 

SSRN; The Heart of the Matter: Copyright, AI Training, and LLMs



"Abstract

This article explores the intricate intersection of copyright law and large language models (LLMs), a cutting-edge artificial intelligence technology that has rapidly gained prominence. The authors provide a comprehensive analysis of the copyright implications arising from the training, fine-tuning, and use of LLMs, which often involve the ingestion of vast amounts of copyrighted material. The paper begins by elucidating the technical aspects of LLMs, including tokenization, word embeddings, and the various stages of LLM development. This technical foundation is crucial for understanding the subsequent legal analysis. The authors then delve into the copyright law aspects, examining potential infringement issues related to both inputs and outputs of LLMs. A comparative legal analysis is presented, focusing on the United States, European Union, United Kingdom, Japan, Singapore, and Switzerland. The article scrutinizes relevant copyright exceptions and limitations in these jurisdictions, including fair use in the US and text and data mining exceptions in the EU. The authors highlight the uncertainties and challenges in applying these legal concepts to LLMs, particularly in light of recent court decisions and legislative developments. The paper also addresses the potential impact of the EU's AI Act on copyright considerations, including its extraterritorial effects. Furthermore, it explores the concept of "making available" in the context of LLMs and its implications for copyright infringement. Recognizing the legal uncertainties and the need for a balanced approach that fosters both innovation and copyright protection, the authors propose licensing as a key solution. They advocate for a combination of direct and collective licensing models to provide a practical framework for the responsible use of copyrighted materials in AI systems.

This article offers valuable insights for legal scholars, policymakers, and industry professionals grappling with the copyright challenges posed by LLMs. It contributes to the ongoing dialogue on adapting copyright law to technological advancements while maintaining its fundamental purpose of incentivizing creativity and innovation."

No comments: