Showing posts with label AI training data. Show all posts
Showing posts with label AI training data. Show all posts

Thursday, March 27, 2025

Judge allows 'New York Times' copyright case against OpenAI to go forward; NPR, March 27, 2025

, NPR ; Judge allows 'New York Times' copyright case against OpenAI to go forward

"A federal judge on Wednesday rejected OpenAI's request to toss out a copyright lawsuit from The New York Times that alleges that the tech company exploited the newspaper's content without permission or payment.

In an order allowing the lawsuit to go forward, Judge Sidney Stein, of the Southern District of New York, narrowed the scope of the lawsuit but allowed the case's main copyright infringement claims to go forward.

Stein did not immediately release an opinion but promised one would come "expeditiously."

The decision is a victory for the newspaper, which has joined forces with other publishers, including The New York Daily News and the Center for Investigative Reporting, to challenge the way that OpenAI collected vast amounts of data from the web to train its popular artificial intelligence service, ChatGPT."

Wednesday, March 26, 2025

Richard Osman urges writers to ‘have a good go’ at Meta over breaches of copyright; The Guardian, March 25, 2025

 , The Guardian; Richard Osman urges writers to ‘have a good go’ at Meta over breaches of copyright

"Richard Osman has said that writers will “have a good go” at taking on Meta after it emerged that the company used a notorious database believed to contain pirated books to train artificial intelligence.

“Copyright law is not complicated at all,” the author of The Thursday Murder Club series wrote in a statement on X on Sunday evening. “If you want to use an author’s work you need to ask for permission. If you use it without permission you’re breaking the law. It’s so simple.”

In January, it emerged that Mark Zuckerberg approved his company’s use of The Library Genesis dataset, a “shadow library” that originated in Russia and contains more than 7.5m books. In 2024 a New York federal court ordered LibGen’s anonymous operators to pay a group of publishers $30m (£24m) in damages for copyright infringement. Last week, the Atlantic republished a searchable database of the titles contained in LibGen. In response, authors and writers’ organisations have rallied against Meta’s use of copyrighted works."

Search LibGen, the Pirated-Books Database That Meta Used to Train AI; The Atlantic, March 20, 2025

Alex Reisner , The Atlantic; Search LibGen, the Pirated-Books Database That Meta Used to Train AI

"Editor’s note: This search tool is part of The Atlantic’s investigation into the Library Genesis data set. You can read an analysis about LibGen and its contents here. Find The Atlantic’s search tool for movie and television writing used to train AI here."

Anthropic wins early round in music publishers' AI copyright case; Reuters, March 26, 2025

 , Reuters; Anthropic wins early round in music publishers' AI copyright case

"Artificial intelligence company Anthropic convinced a California federal judge on Tuesday to reject a preliminary bid to block it from using lyrics owned by Universal Music Group and other music publishers to train its AI-powered chatbot Claude.

U.S. District Judge Eumi Lee said that the publishers' request was too broad and that they failed to show Anthropic's conduct caused them "irreparable harm."

Tuesday, March 25, 2025

Ben Stiller, Mark Ruffalo and More Than 400 Hollywood Names Urge Trump to Not Let AI Companies ‘Exploit’ Copyrighted Works; Variety, March 17, 2025

Todd Spangler , Variety; Ben Stiller, Mark Ruffalo and More Than 400 Hollywood Names Urge Trump to Not Let AI Companies ‘Exploit’ Copyrighted Works

"More than 400 Hollywood creative leaders signed an open letter to the Trump White House’s Office of Science and Technology Policy, urging the administration to not roll back copyright protections at the behest of AI companies.

The filmmakers, writers, actors, musicians and others — which included Ben Stiller, Mark Ruffalo, Cynthia Erivo, Cate Blanchett, Cord Jefferson, Paul McCartney, Ron Howard and Taika Waititi — were submitting comments for the Trump administration’s U.S. AI Action Plan⁠. The letter specifically was penned in response to recent submissions to the Office of Science and Technology Policy from OpenAI and Google, which asserted that U.S. copyright law allows (or should allow) allow AI companies to train their system on copyrighted works without obtaining permission from (or compensating) rights holders."

Monday, March 24, 2025

How to tell when AI models infringe copyright; The Washington Post, March 24, 2024

, The Washington Post; How to tell when AI models infringe copyright

"Fair use has been a big part of AI companies’ defense. No matter how well a plaintiff manages to argue that a given AI model infringes copyright, the AI maker can usually point to the doctrine of fair use, which requires consideration of multiple factors, including the purpose of the use (here, criticism, comment and research are favored) and the effect of the use on the marketplace. If, in using a copied work, an AI model adds “something new,” it is probably in the clear."

Should AI be treated the same way as people are when it comes to copyright law? ; The Hill, March 24, 2025

 NICHOLAS CREEL, The Hill ; Should AI be treated the same way as people are when it comes to copyright law? 

"The New York Times’s lawsuit against OpenAI and Microsoft highlights an uncomfortable contradiction in how we view creativity and learning. While the Times accuses these companies of copyright infringement for training AI on their content, this ignores a fundamental truth: AI systems learn exactly as humans do, by absorbing, synthesizing and transforming existing knowledge into something new."

Friday, March 21, 2025

AI firms push to use copyrighted content freely; Axios, March 20, 2025

 Ina Fried, Axios; AI firms push to use copyrighted content freely

"A sharp divide over AI engines' free use of copyrighted material has emerged as a key conflict among the firms and groups that recently flooded the White House with advice on its forthcoming "AI Action Plan."

Why it matters: Copyright infringement claims were among the first legal challenges following ChatGPT's launch, with multiple lawsuits now winding their way through the courts.

Driving the news: In their White House memos, OpenAI and Google argue that their  use of copyrighted material for AI is a matter of national security — and if that use is limited, China will gain an unfair edge in the AI race."

Sunday, March 16, 2025

The AI Copyright Battle: Why OpenAI And Google Are Pushing For Fair Use; Forbes, March 15, 2025

Virginie Berger , Forbes; The AI Copyright Battle: Why OpenAI And Google Are Pushing For Fair Use

"Furthermore, the ongoing lawsuits against AI firms could serve as a necessary correction to push the industry toward genuinely intelligent machine learning models instead of data-compression-based generators masquerading as intelligence. If legal challenges force AI firms to rethink their reliance on copyrighted content, it could spur innovation toward creating more advanced, ethically sourced AI systems...

Recommendations: Finding a Sustainable Balance

A sustainable solution must reconcile technological innovation with creators' economic interests. Policymakers should develop clear federal standards specifying fair use parameters for AI training, considering solutions such as:

  • Licensing and Royalties: Transparent licensing arrangements compensating creators whose work is integral to AI datasets.
  • Curated Datasets: Government or industry-managed datasets explicitly approved for AI training, ensuring fair compensation.
  • Regulated Exceptions: Clear legal definitions distinguishing transformative use in AI training contexts.

These nuanced policies could encourage innovation without sacrificing creators’ rights.

The lobbying by OpenAI and Google reveals broader tensions between rapid technological growth and ethical accountability. While national security concerns warrant careful consideration, they must not justify irresponsible regulation or ethical compromises. A balanced approach, preserving innovation, protecting creators’ rights, and ensuring sustainable and ethical AI development, is critical for future global competitiveness and societal fairness."

OpenAI declares AI race “over” if training on copyrighted works isn’t fair use; Ars Technica, March 13, 2025

ASHLEY BELANGER  , Ars Technica; OpenAI declares AI race “over” if training on copyrighted works isn’t fair use

"OpenAI is hoping that Donald Trump's AI Action Plan, due out this July, will settle copyright debates by declaring AI training fair use—paving the way for AI companies' unfettered access to training data that OpenAI claims is critical to defeat China in the AI race.

Currently, courts are mulling whether AI training is fair use, as rights holders say that AI models trained on creative works threaten to replace them in markets and water down humanity's creative output overall.

OpenAI is just one AI company fighting with rights holders in several dozen lawsuits, arguing that AI transforms copyrighted works it trains on and alleging that AI outputs aren't substitutes for original works.

So far, one landmark ruling favored rights holders, with a judge declaring AI training is not fair use, as AI outputs clearly threatened to replace Thomson-Reuters' legal research firm Westlaw in the market, Wired reported. But OpenAI now appears to be looking to Trump to avoid a similar outcome in its lawsuits, including a major suit brought by The New York Times."

Friday, March 14, 2025

French publishers and authors sue Meta over copyright works used in AI training; AP, March 12, 2025

 KELVIN CHAN, AP; French publishers and authors sue Meta over copyright works used in AI training

"French publishers and authors said Wednesday they’re taking Meta to court, accusing the social media company of using their works without permission to train its artificial intelligence model. 

Three trade groups said they were launching legal action against Meta in a Paris court over what they said was the company’s “massive use of copyrighted works without authorization” to train its generative AI model. 

The National Publishing Union, which represents book publishers, has noted that “numerous works” from its members are turning up in Meta’s data pool, the group’s president, Vincent Montagne, said in a joint statement."

Tuesday, March 11, 2025

Judge says Meta must defend claim it stripped copyright info from Llama's training fodder; The Register, March 11, 2025

Thomas Claburn , The Register; Judge says Meta must defend claim it stripped copyright info from Llama's training fodder

"A judge has found Meta must answer a claim it allegedly removed so-called copyright management information from material used to train its AI models.

The Friday ruling by Judge Vince Chhabria concerned the case Kadrey et al vs Meta Platforms, filed in July 2023 in a San Francisco federal court as a proposed class action by authors Richard Kadrey, Sarah Silverman, and Christopher Golden, who reckon the Instagram titan's use of their work to train its neural networks was illegal.

Their case burbled along until January 2025 when the plaintiffs made the explosive allegation that Meta knew it used copyrighted material for training, and that its AI models would therefore produce results that included copyright management information (CMI) – the fancy term for things like the creator of a copyrighted work, its license and terms of use, its date of creation, and so on, that accompany copyrighted material.

The miffed scribes alleged Meta therefore removed all of this copyright info from the works it used to train its models so users wouldn’t be made aware the results they saw stemmed from copyrighted stuff."

Saturday, March 1, 2025

Prioritise artists over tech in AI copyright debate, MPs say; The Guardian, February 26, 2025

, The Guardian; Prioritise artists over tech in AI copyright debate, MPs say

"Two cross-party committees of MPs have urged the government to prioritise ensuring that creators are fairly remunerated for their creative work over making it easy to train artificial intelligence models.

The MPs argued there needed to be more transparency around the vast amounts of data used to train generative AI models, and urged the government not to press ahead with plans to require creators to opt out of having their data used.

The government’s preferred solution to the tension between AI and copyright law is to allow AI companies to train the models on copyrighted work by giving them an exception for “text and data mining”, while giving creatives the opportunity to opt out through a “rights reservation” system.

The chair of the culture, media and sport committee, Caroline Dinenage, said there had been a “groundswell of concern from across the creative industries” in response to the proposals, which “illustrates the scale of the threat artists face from artificial intelligence pilfering the fruits of their hard-earned success without permission”.

She added that making creative works “fair game unless creators say so” was akin to “burglars being allowed into your house unless there’s a big sign on your front door expressly telling them that thievery isn’t allowed”."


Thursday, February 27, 2025

An AI Maker Was Just Found Liable for Copyright Infringement. What Does This Portend for Content Creators and AI Makers?; The Federalist Society, February 25, 2025

  , The Federalist Society; An AI Maker Was Just Found Liable for Copyright Infringement. What Does This Portend for Content Creators and AI Makers?

"In a case decided on February 11, the makers of generative AI (GenAI), such as ChatGPT, lost the first legal battle in the war over whether they commit copyright infringement by using the material of others as training data without permission. The case is called Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc.

If other courts follow this ruling, the cost of building and selling GenAI services will dramatically increase. Such businesses are already losing money.

The ruling could also empower content creators, such as writers, to deny the use of their material to train GenAIs or to demand license fees. Some creators might be unwilling to license use of their material for training AIs due to fear that GenAI will destroy demand for their work."

Wednesday, February 26, 2025

UK newspapers launch campaign against AI copyright plans; Independent, February 25, 2025

Martyn Landi, Independent; UK newspapers launch campaign against AI copyright plans

"Some of the UK’s biggest newspapers have used a coordinated campaign across their front pages to raise their concerns about AI’s impact on the creative industries.

Special wraps appeared on Tuesday’s editions of the Daily Express, Daily Mail, The Mirror, the Daily Star, The i, The Sun, and The Times – as well as a number of regional titles – criticising a Government consultation around possible exemptions being added to copyright law for training AI models.

The proposals would allow tech firms to use copyrighted material from creatives and publishers without having to pay or gain a licence, or reimbursing creatives for using their work."

Tuesday, February 25, 2025

Musicians release silent album to protest UK's AI copyright changes; Reuters, February 25, 2025

 , Reuters; Musicians release silent album to protest UK's AI copyright changes

"More than 1,000 musicians, including Kate Bush and Cat Stevens, on Tuesday released a silent album to protest proposed changes to Britain's copyright laws, which could allow tech firms to train artificial intelligence models using their work."

Monday, February 24, 2025

Copyright 'sell-out' will silence British musicians, says BRIAN MAY; Daily Mail, February 23, 2025

Andy Behring , Daily Mail; Copyright 'sell-out' will silence British musicians, says BRIAN MAY

"No one will make music in Britain any more if Labour's AI copyright proposal succeeds, Sir Brian May warned last night as he backed the Daily Mail's campaign against it.

The Queen guitarist said he feared it may already be 'too late' because 'monstrously arrogant' Big Tech barons have already carried out an industrial-scale 'theft' of Britain's cultural genius.

He called on the Government to apply the brakes before the next chapter of Britain's rich cultural heritage – which includes Shakespeare, Chaucer, James Bond, The Beatles and Britpop – is nipped in the bud thanks to Sir Keir Starmer's copyright 'sell-out'...

Sir Brian said: 'My fear is that it's already too late – this theft has already been performed and is unstoppable, like so many incursions that the monstrously arrogant billionaire owners of Al and social media are making into our lives. The future is already forever changed."

Thursday, February 20, 2025

AI and Copyright: Expanding Copyright Hurts Everyone—Here’s What to Do Instead; Electronic Frontier Foundation (EFF), February 19, 2025

TORI NOBLE, Electronic Frontier Foundation (EFF); AI and Copyright: Expanding Copyright Hurts Everyone—Here’s What to Do Instead


[Kip Currier: No, not everyone. Not requiring Big Tech to figure out a way to fairly license or get permission to use the copyrighted works of creators unjustly advantages these deep pocketed corporations. It also inequitably disadvantages the economic and creative interests of the human beings who labor to create copyrightable content -- authors, songwriters, visual artists, and many others.

The tell is that many of these same Big Tech companies are only too willing to file copyright infringement lawsuits against anyone whom they allege is infringing their AI content to create competing products and services.]


[Excerpt]


"Threats to Socially Valuable Research and Innovation 

Requiring researchers to license fair uses of AI training data could make socially valuable research based on machine learning (ML) and even text and data mining (TDM) prohibitively complicated and expensive, if not impossible. Researchers have relied on fair use to conduct TDM research for a decade, leading to important advancements in myriad fields. However, licensing the vast quantity of works that high-quality TDM research requires is frequently cost-prohibitive and practically infeasible.  

Fair use protects ML and TDM research for good reason. Without fair use, copyright would hinder important scientific advancements that benefit all of us. Empirical studies back this up: research using TDM methodologies are more common in countries that protect TDM research from copyright control; in countries that don’t, copyright restrictions stymie beneficial research. It’s easy to see why: it would be impossible to identify and negotiate with millions of different copyright owners to analyze, say, text from the internet."

Monday, February 17, 2025

Copyright battles loom over artists and AI; Financial Times, February 16, 2025

louise.lucas@ft.com, Financial Times ; Copyright battles loom over artists and AI

"Artists are the latest creative industry to gripe about the exploitative nature of artificial intelligence. More than 3,000 have written to protest against plans by Christie’s to auction art created using AI."

Sunday, February 16, 2025

Court filings show Meta paused efforts to license books for AI training; TechCrunch, February 14, 3025

Kyle Wiggers, TechCrunch; Court filings show Meta paused efforts to license books for AI training

"According to one transcript, Sy Choudhury, who leads Meta’s AI partnership initiatives, said that Meta’s outreach to various publishers was met with “very slow uptake in engagement and interest.”

“I don’t recall the entire list, but I remember we had made a long list from initially scouring the Internet of top publishers, et cetera,” Choudhury said, per the transcript, “and we didn’t get contact and feedback from — from a lot of our cold call outreaches to try to establish contact.”

Choudhury added, “There were a few, like, that did, you know, engage, but not many.”

According to the court transcripts, Meta paused certain AI-related book licensing efforts in early April 2023 after encountering “timing” and other logistical setbacks. Choudhury said some publishers, in particular fiction book publishers, turned out to not in fact have the rights to the content that Meta was considering licensing, per a transcript.

“I’d like to point out that the — in the fiction category, we quickly learned from the business development team that most of the publishers we were talking to, they themselves were representing that they did not have, actually, the rights to license the data to us,” Choudhury said. “And so it would take a long time to engage with all their authors.”"