Showing posts with label LLMs. Show all posts
Showing posts with label LLMs. Show all posts

Thursday, March 7, 2024

Introducing CopyrightCatcher, the first Copyright Detection API for LLMs; Patronus AI, March 6, 2024

Patronus AI; Introducing CopyrightCatcher, thefirst Copyright Detection API for LLMs

"Managing risks from unintended copyright infringement in LLM outputs should be a central focus for companies deploying LLMs in production.

  • On an adversarial copyright test designed by Patronus AI researchers, we found that state-of-the-art LLMs generate copyrighted content at an alarmingly high rate 😱
  • OpenAI’s GPT-4 produced copyrighted content on 44% of the prompts.
  • Mistral’s Mixtral-8x7B-Instruct-v0.1 produced copyrighted content on 22% of the prompts.
  • Anthropic’s Claude-2.1 produced copyrighted content on 8% of the prompts.
  • Meta’s Llama-2-70b-chat produced copyrighted content on 10% of the prompts.
  • Check out CopyrightCatcher, our solution to detect potential copyright violations in LLMs. Here’s the public demo, with open source model inference powered by Databricks Foundation Model APIs. 🔥

LLM training data often contains copyrighted works, and it is pretty easy to get an LLM to generate exact reproductions from these texts1. It is critical to catch these reproductions, since they pose significant legal and reputational risks for companies that build and use LLMs in production systems2. OpenAI, Anthropic, and Microsoft have all faced copyright lawsuits on LLM generations from authors3, music publishers4, and more recently, the New York Times5.

To check whether LLMs respond to your prompts with copyrighted text, you can use CopyrightCatcher. It detects when LLMs generate exact reproductions of content from text sources like books, and highlights any copyrighted text in LLM outputs. Check out our public CopyrightCatcher demo here!

Saturday, January 27, 2024

Library Copyright Alliance Principles for Copyright and Artificial Intelligence; Library Copyright Alliance (LCA), American Library Association (ALA), Association of Research Libraries (ARL), July 10, 2023

 Library Copyright Alliance (LCA), American Library Association (ALA), Association of Research Libraries (ARL); Library Copyright Alliance Principles for Copyright and Artificial Intelligence

"The existing U.S. Copyright Act, as applied and interpreted by the Copyright Office and the courts, is fully capable at this time to address the intersection of copyright and AI without amendment.

  • Based on well-established precedent, the ingestion of copyrighted works to create large language models or other AI training databases generally is a fair use.

    • Because tens—if not hundreds—of millions of works are ingested to create an LLM, the contribution of any one work to the operation of the LLM is de minimis; accordingly, remuneration for ingestion is neither appropriate nor feasible.

    • Further, copyright owners can employ technical means such as the Robots Exclusion Protocol to prevent their works from being used to train AIs.

  • If an AI produces a work that is substantially similar in protected expression to a work that was ingested by the AI, that new work infringes the copyright in the original work.

• If the original work was registered prior to the infringement, the copyright owner of the original work can bring a copyright infringement action for statutory damages against the AI provider and the user who prompted the AI to produce the substantially similar work.

• Applying traditional principles of human authorship, a work that is generated by an AI might be copyrightable if the prompts provided by the user sufficiently controlled the AI such that the resulting work as a whole constituted an original work of human authorship.

AI has the potential to disrupt many professions, not just individual creators. The response to this disruption (e.g., not be treated as a means for addressing these broader societal challenges. support for worker retraining through institutions such as community colleges and public libraries) should be developed on an economy-wide basis, and copyright law should not be treated as a means for addressing these broader societal challenges.

AI also has the potential to serve as a powerful tool in the hands of artists, enabling them to express their creativity in new and efficient ways, thereby furthering the objectives of the copyright system."

Training Generative AI Models on Copyrighted Works Is Fair Use; ARL Views, January 23, 2024

Katherine Klosek, Director of Information Policy and Federal Relations, Association of Research Libraries (ARL), and Marjory S. Blumenthal, Senior Policy Fellow, American Library Association (ALA) Office of Public Policy and Advocacy |, ARL Views; Training Generative AI Models on Copyrighted Works Is Fair Use

"In a blog post about the case, OpenAI cites the Library Copyright Alliance (LCA) position that “based on well-established precedent, the ingestion of copyrighted works to create large language models or other AI training databases generally is a fair use.” LCA explained this position in our submission to the US Copyright Office notice of inquiry on copyright and AI, and in the LCA Principles for Copyright and AI.

LCA is not involved in any of the AI lawsuits. But as champions of fair use, free speech, and freedom of information, libraries have a stake in maintaining the balance of copyright law so that it is not used to block or restrict access to information. We drafted the principles on AI and copyright in response to efforts to amend copyright law to require licensing schemes for generative AI that could stunt the development of this technology, and undermine its utility to researchers, students, creators, and the public. The LCA principles hold that copyright law as applied and interpreted by the Copyright Office and the courts is flexible and robust enough to address issues of copyright and AI without amendment. The LCA principles also make the careful and critical distinction between input to train an LLM, and output—which could potentially be infringing if it is substantially similar to an original expressive work.

On the question of whether ingesting copyrighted works to train LLMs is fair use, LCA points to the history of courts applying the US Copyright Act to AI."