Showing posts with label AI training data. Show all posts
Showing posts with label AI training data. Show all posts

Monday, October 16, 2023

Writers Guild AI Deal Pushes Studios Down New Copyright Path; Bloomberg Law, October 12, 2023

Kyle Jahner, Bloomberg Law; Writers Guild AI Deal Pushes Studios Down New Copyright Path

"Movie and television studios will have to monitor their use of AI in the script-writing process or face copyright complications following the recent deal with the screenwriters’ union.

Provisions restricting—in some scenarios banning—use of AI in content fed to writers are embedded among other aspects of the collective bargaining agreement with Hollywood studios ratified by Writers Guild of America members Monday, less than two weeks after governing boards of the writers’ union ended the nearly five-month strike. The WGA also secured the right to bar the use of writers’ material to train AI models."

Thursday, October 12, 2023

Google promises to take the legal heat in users’ AI copyright lawsuits; The Verge, October 12, 2023

Emilia David , The Verge; Google promises to take the legal heat in users’ AI copyright lawsuits

"Google will protect customers who use some of its generative AI products if they get sued for copyright infringement, the company says.

In a blog post, Google said customers using products that are now embedded with generative AI features will be protected, attempting to assuage growing fears that generative AI could run afoul of copyright rules. It specifically mentioned seven products it would legally cover: Duet AI in Workspace (including text generated in Google Docs and Gmail and images in Google Slides and Google Meet), Duet AI in Google Cloud, Vertex AI Search, Vertex AI Conversation, Vertex AI Text Embedding API, Visual Captioning on Vertex AI, and Codey APIs. Google’s Bard search tool was not mentioned.

If you are challenged on copyright grounds, we will assume responsibility for the potential legal risks involved,” the company said."

Monday, September 25, 2023

Getty Images promises its new AI contains no copyrighted art; MIT Technology Review, September 25, 2023

, MIT Technology Review; Getty Images promises its new AI contains no copyrighted art

"Getty Images is so confident its new generative AI model is free of copyrighted content that it will cover any potential intellectual-property disputes for its customers. 

The generative AI system, announced today, was built by Nvidia and is trained solely on images in Getty’s image library. It does not include logos or images that have been scraped off the internet without consent. 

“Fundamentally, it’s trained; it’s clean. It’s viable for businesses to use. We’ll stand behind that claim,” says Craig Peters, the CEO of Getty Images. Peters says companies that want to use generative AI want total legal certainty they won’t face expensive copyright lawsuits.""

Wednesday, September 20, 2023

Franzen, Grisham and Other Prominent Authors Sue OpenAI; The New York Times, September 20, 2023

 Alexandra Alter and Franzen, Grisham and Other Prominent Authors Sue OpenAI

"A group of prominent novelists, including John Grisham, Jonathan Franzen and Elin Hilderbrand, are joining the legal battle against OpenAI over its chatbot technology, as fears about the encroachment of artificial intelligence on creative industries continue to grow.

More than a dozen authors filed a lawsuit against OpenAI on Tuesday, accusing the company, which has been backed with billions of dollars in investment from Microsoft, of infringing on their copyrights by using their books to train its popular ChatGPT chatbot. The complaint, which was filed along with the Authors Guild, said that OpenAI’s chatbots can now produce “derivative works” that can mimic and summarize the authors’ books, potentially harming the market for authors’ work, and that the writers were neither compensated nor notified by the company."

Digimarc adds copyright information to digital data; The Verge,September 19, 2023

 Emilia David, The Verge; Digimarc adds copyright information to digital data

"Software company Digimarc will now let copyright owners add more information to their work, which the company said will improve how AI models treat copyright in training data. 

In a statement, Digimarc said its new Digimarc Validate service lets users include ownership identification in the metadata. The company said this means that when copyrighted material becomes part of a generative AI training dataset, users can point to the digital watermark with intellectual property information.

For example, an image with Digimarc Validate adds a © symbol that is machine-readable and includes information on who owns the copyright. The company said Digimarc Validate is powered by its digital watermark detection software, called SAFE, or secure, accurate, fair, and efficient, which AI companies have to buy into if they want to prevent copyrighted material with the Digimarc Validate symbol from making it to training datasets."

Tuesday, September 12, 2023

Another group of writers is suing OpenAI over copyright claims; The Verge, September 11, 2023

Emma Roth , The Verge; Another group of writers is suing OpenAI over copyright claims

"A group of writers is suing OpenAI over claims the company illegally used their works to train its AI ChatGPT chatbot, as reported earlier by Reuters. In a lawsuit filed on Friday, Michael Chabon, David Henry Hwang, Rachel Louise Snyder, and Ayelet Waldman allege OpenAI benefits and profits from the “unauthorized and illegal use” of their copyrighted content.

The lawsuit is seeking class-action status and calls out ChatGPT’s ability to summarize and analyze the content written by the authors, stating this “is only possible” if OpenAI trained its GPT large language model on their works. It adds that these outputs are actually “derivative” works that infringe on their copyrights."

Saturday, August 26, 2023

Studios’ Offer to Writers May Lead to AI-Created Scripts That Are Copyrightable; The Hollywood Reporter, August 23, 2023

 Winston Cho, The Hollywood Reporter; Studios’ Offer to Writers May Lead to AI-Created Scripts That Are Copyrightable

"But missing from the proposal, which was described as meeting the “priority concerns” of the guild, is how the studios need writers to exploit any work created by AI under existing copyright laws. That’s because works solely created by AI are not copyrightable. To be granted protection, a human would need to rewrite any AI-produced script...

By keeping AI on the table, the studios may be looking to capitalize on the intellectual property rights around works created by the tools. “If a human touches material created by generative AI, then the typical copyright protections will kick in,” a source close to the AMPTP says...

The studios may be looking toward producing of AI-generated scripts, but copyright protection is only possible for those works if they are revised by human writers. Material created solely by AI would enter the public domain upon release, potentially restricting opportunities for exploitation."

Thursday, August 24, 2023

Scraping or Stealing? A Legal Reckoning Over AI Looms; Hollywood Reporter, August 22, 2023

Winston Cho, The Hollywood Reporter ; Scraping or Stealing? A Legal Reckoning Over AI Looms

"Engineers build AI art generators by feeding AI systems, known as large language models, voluminous databases of images downloaded from the internet without licenses. The artists’ suit revolves around the argument that the practice of feeding these systems copyrighted works constitutes intellectual property theft. A finding of infringement in the case may upend how most AI systems are built in the absence of regulation placing guardrails around the industry. If the AI firms are found to have infringed on any copyrights, they may be forced to destroy datasets that have been trained on copyrighted works. They also face stiff penalties of up to $150,000 for each infringement.

AI companies maintain that their conduct is protected by fair use, which allows for the utilization of copyrighted works without permission as long as that use is transformative. The doctrine permits unlicensed use of copyrighted works under limited circumstances. The factors that determine whether a work qualifies include the purpose of the use, the degree of similarity, and the impact of the derivative work on the market for the original. Central to the artists’ case is winning the argument that the AI systems don’t create works of “transformative use,” defined as when the purpose of the copyrighted work is altered to create something with a new meaning or message."

Tuesday, July 25, 2023

The Generative AI Battle Has a Fundamental Flaw; Wired, July 25, 2023

  , Wired; The Generative AI Battle Has a Fundamental Flaw

"At the core of these cases, explains Sag, is the same general theory: that LLMs “copied” authors’ protected works. Yet, as Sag explained in testimony to a US Senate subcommittee hearing earlier this month, models like GPT-3.5 and GPT-4 do not “copy” work in the traditional sense. Digest would be a more appropriate verb—digesting training data to carry out their function: predicting the best next word in a sequence. “Rather than thinking of an LLM as copying the training data like a scribe in a monastery,” Sag said in his Senate testimony, “it makes more sense to think of it as learning from the training data like a student.”...

Ultimately, though, the technology is not going away, and copyright can only remedy some of its consequences. As Stephanie Bell, a research fellow at the nonprofit Partnership on AI, notes, setting a precedent where creative works can be treated like uncredited data is “very concerning.” To fully address a problem like this, the regulations AI needs aren't yet on the books."

Wednesday, July 19, 2023

US judge finds flaws in artists' lawsuit against AI companies; Reuters, July 19, 2023

 , Reuters; US judge finds flaws in artists' lawsuit against AI companies

"U.S. District Judge William Orrick said during a hearing in San Francisco on Wednesday that he was inclined to dismiss most of a lawsuit brought by a group of artists against generative artificial intelligence companies, though he would allow them to file a new complaint.

Orrick said that the artists should more clearly state and differentiate their claims against Stability AI, Midjourney and DeviantArt, and that they should be able to "provide more facts" about the alleged copyright infringement because they have access to Stability's relevant source code."

Monday, July 17, 2023

AI learned from their work. Now they want compensation.; The Washington Post, July 16, 2023

 , The Washington Post; AI learned from their work. Now they want compensation.

"Artists say the livelihoods of millions of creative workers are at stake, especially because AI tools are already being used to replace some human-made work. Mass scraping of art, writing and movies from the web for AI training is a practice creators say they never considered or consented to.

But in public appearances and in responses to lawsuits, the AI companies have argued that the use of copyrighted works to train AI falls under fair use — a concept in copyright law that creates an exception if the material is changed in a “transformative” way."

Wednesday, July 12, 2023

Google hit with class-action lawsuit over AI data scraping; Reuters, July 11, 2023

, Reuters ; Google hit with class-action lawsuit over AI data scraping

"Alphabet's Google (GOOGL.O) was accused in a proposed class action lawsuit on Tuesday of misusing vast amounts of personal information and copyrighted material to train its artificial intelligence systems.

The complaint, filed in San Francisco federal court by eight individuals seeking to represent millions of internet users and copyright holders, said Google's unauthorized scraping of data from websites violated their privacy and property rights."

Tuesday, July 11, 2023

What Sarah Silverman’s lawsuit against OpenAI and Meta really means | The AI Beat; Venture Beat, July 10, 2023

 , Venture Beat; What Sarah Silverman’s lawsuit against OpenAI and Meta really means | The AI Beat

"Legal AI issues around copyright and ‘fair use’ growing louder 

These legal issues around copyright and “fair use” are not going away — in fact, they go to the heart of what today’s LLMs are made of — that is, the training data. As I discussed last week, web scraping for massive amounts of data can arguably be described as the secret sauce of generative AI. AI chatbots like ChatGPT, LLaMA, Claude (from Anthropic) and Bard (from Google) can spit out coherent text because they were trained on massive corpora of data, mostly scraped from the internet. And as the size of today’s LLMs like GPT-4 have ballooned to hundreds of billions of tokens, so has the hunger for data."

Thursday, July 6, 2023

The copyright battles against OpenAI have begun; Quartz, July 6. 2023

Faustine Ngila, Quartz; The copyright battles against OpenAI have begun

Let the AI copyright battles begin... 

"With this latest lawsuit from Tremblay and Awad, regulators and courts will be tasked with mulling over the rules of copyright with regards to AI. They may require generative AI companies to disclose how and where they sourced their training data, letting the world peek inside the black box of these AI systems for the very first time."

Monday, July 3, 2023

Bestselling authors Mona Awad and Paul Tremblay sue OpenAI over copyright infringement; The Los Angeles Times, July 1, 2023

EMILY ST. MARTIN, The Los Angeles Times; Bestselling authors Mona Awad and Paul Tremblay sue OpenAI over copyright infringement

"Two bestselling novelists filed a suit against OpenAI in a San Francisco federal court on Wednesday, claiming in a proposed class action that the company used copyright-protected intellectual property to “train” its artificial intelligence chatbot.

Authors Mona Awad and Paul Tremblay claim that ChatGPT was trained in part by “ingesting” their novels without their consent."

Thursday, June 29, 2023

Authors Sue OpenAI Claiming Mass Copyright Infringement of Hundreds of Thousands of Novels; The Hollywood Reporter, June 29, 2023

WINSTON CHO, The Hollywood Reporter; Authors Sue OpenAI Claiming Mass Copyright Infringement of Hundreds of Thousands of Novels

"Another lawsuit has been filed against OpenAI over its unauthorized collection of information across the web to train its artificial intelligence chatbot, this time by authors who say ChatGPT infringes on copyrights to their novels.

The proposed class action filed in San Francisco federal court on Wednesday alleges that OpenAI “relied on harvesting mass quantities” of copyright-protected works “without consent, without credit, and without compensation.” It seeks a court order that the company infringed on writers’ works when it illegally downloaded copies of novels to train its AI system and that ChatGPT’s answers constitute infringement."

Sunday, June 18, 2023

Generative AI is a minefield for copyright law; The Conversation, June 15, 2023

 JD-PhD Student, Massachusetts Institute of Technology (MIT), Lecturer on Law, Harvard Law School,  PhD Student in Media Arts and Sciences, Massachusetts Institute of Technology (MIT), The Conversation; ; Generative AI is a minefield for copyright law 

"While copyright law tends to favor an all-or-nothing approach, scholars at Harvard Law School have proposed new models of joint ownership that allow artists to gain some rights in outputs that resemble their works.

In many ways, generative AI is yet another creative tool that allows a new group of people access to image-making, just like cameras, paintbrushes or Adobe Photoshop. But a key difference is this new set of tools relies explicitly on training data, and therefore creative contributions cannot easily be traced back to a single artist. 

The ways in which existing laws are interpreted or reformed – and whether generative AI is appropriately treated as the tool it is – will have real consequences for the future of creative expression."

Wednesday, June 7, 2023

Senate aims to navigate conflict between copyright and training AI; Washington Examiner, June 7, 2023

Christopher Hutton, Washington Examiner ; Senate aims to navigate conflict between copyright and training AI

"The Senate is set this week to begin addressing the tension between using images and data to train artificial intelligence and existing copyright law...

The Senate Judiciary Committee is scheduled on Wednesday to host the first of several hearings on AI and intellectual property. This one will deal with "Patents, Innovation, and Competition." The hearing will feature professors from the University of California, Los Angeles and Laura Sheridan, Google's head of patent policy."

Thursday, May 4, 2023

OpenAI's ChatGPT may face a copyright quagmire after 'memorizing' these books; The Register, May 3, 2023

Thomas Claburn, The Register; OpenAI's ChatGPT may face a copyright quagmire after 'memorizing' these books

"Tyler Ochoa, a professor in the Law department at Santa Clara University in California, told The Register he fully expects to see lawsuits against the makers of large language models that generate text, including OpenAI, Google, and others.

Ochoa said the copyright issues with AI text generation are exactly the same as the issues with AI image generation. First: is copying large amounts of text or images for training the model fair use? The answer to that, he said, is probably yes.

Second: if the model generates output that's too similar to the input – what the paper refers to as "memorization" – is that copyright infringement? The answer to that, he said, is almost certainly yes.

And third: if the output of an AI text generator is not a copy of an existing text, is it protected by copyright?

Under current law, said Ochoa, the answer is no – because US copyright law requires human creativity, though some countries will disagree and will protect AI-generated works. However, he added, activities like selecting, arranging, and modifying AI model output makes copyright protection more plausible."