Showing posts with label AI training data. Show all posts
Showing posts with label AI training data. Show all posts

Wednesday, January 14, 2026

Britain seeks 'reset' in copyright battle between AI and creators; Reuters, January 13, 2026

Reuters; Britain seeks 'reset' in copyright battle between AI and creators

"British technology minister Liz Kendall said on Tuesday the government was seeking a "reset" on plans to overhaul copyright rules to accommodate artificial intelligence, pledging to protect creators while unlocking AI's economic potential.

Creative industries worldwide are grappling with legal and ethical challenges posed by AI systems that generate original content after being trained on popular works, often without compensating the original creators."

 

Tuesday, January 13, 2026

‘Clock Is Ticking’ For Creators On AI Content Copyright Claims, Experts Warn; Forbes, January 9, 2026

 Rob Salkowitz, Forbes; ‘Clock Is Ticking’ For Creators On AI Content Copyright Claims, Experts Warn

"Despite this string of successes, creators like BT caution that content owners need to move quickly to secure any kind of terms. “A lot of artists have their heads in the sand with respect to AI,” he said. “The fact is, if they don’t come to some kind of agreement, they may end up with nothing.”

The concern is that AI models are increasingly being trained on synthetic data: that is, on the output of AI systems, rather than on content attributable to any individual creator or rights owner. Gartner estimates that 75% of AI training data in 2026 will be synthetic. That number could hit 100% by 2030. Once the tech companies no longer need human-produced content, they will stop paying for it.

“The quality of outputs from AI systems has been improving dramatically, which means that it is possible to train on synthetic data without risking model collapse,” said Dr. Daniela Braga, founder and CEO of the data training firm Defined.ai, in a separate interview at CES. “The window is definitely closing for individual rights owners to secure favorable terms.”

Other experts suggest that these claims may be overstated.

Braga says the best way creators can protect themselves is to do business with ethical companies willing to provide compensation for high-quality human-produced content and represent the superior value of that content to their customers. As models grow in capabilities, the need will shift from sheer volume of data to data that is appropriately tagged and annotated to fit easily into specific use cases.

There remain some profound questions around the sustainability of AI from a business standpoint, with demand for services among enterprise and consumers lagging the massive, and massively expensive, build-out of capacity. For some artists opposed to generative AI in its entirety, there may be the temptation to wait it out until the bubble bursts. After all, these artists created their work to be enjoyed by humans, not to be consumed in bulk by machines threatening their livelihoods. In light of those objections, the prospect of a meager payout might seem unappealing."

Friday, January 9, 2026

Thursday, January 8, 2026

OpenAI Must Turn Over 20 Million ChatGPT Logs, Judge Affirms; Bloomberg Law, January 5, 2026

 , Bloomberg Law; OpenAI Must Turn Over 20 Million ChatGPT Logs, Judge Affirms

"OpenAI Inc. will have to turn over 20 million anonymized ChatGPT logs in a consolidated AI copyright case after it failed to convince a federal judge to throw out a magistrate judge’s order the company said insufficiently weighed privacy concerns.

Magistrate Judge Ona T. Wang sufficiently considered privacy concerns against the material’s relevance to the ongoing litigation in her discovery ruling in favor of news organization plaintiffs in five lawsuits, District Judge Sidney H. Stein said in an order Monday. She rejected OpenAI’s arguments it should be allowed to run a search of the 20 million-log sample and produce conversations implicating the plaintiffs’ works, saying no case law requires the court to order the least burdensome discovery possible."

Monday, January 5, 2026

AI copyright battles enter pivotal year as US courts weigh fair use; Reuters, January 5, 2026

  , Reuters; AI copyright battles enter pivotal year as US courts weigh fair use

"The sprawling legal fight over tech companies' vast copying of copyrighted material to train their artificial intelligence systems could be entering a decisive phase in 2026.

After a string of fresh lawsuits and a landmark settlement in 2025, the new year promises to bring a wave of rulings that could define how U.S. copyright law applies to generative AI. At stake is whether companies like OpenAI, Google and Meta can rely on the legal doctrine of fair use to shield themselves from liability – or if they must reimburse copyright holders, which could cost billions."

Monday, December 22, 2025

OpenAI, Anthropic, xAI Hit With Copyright Suit from Writers; Bloomberg Law, December 22, 2025

 Annelise Levy, Bloomberg Law; OpenAI, Anthropic, xAI Hit With Copyright Suit from Writers

"Writers including Pulitzer Prize-winning journalist John Carreyrou filed a copyright lawsuit accusing six AI giants of using pirated copies of their books to train large language models.

The complaint, filed Monday in the US District Court for the Northern District of California, claims Anthropic PBC, Google LLCOpenAI Inc.Meta Platforms Inc., xAI Corp., and Perplexity AI Inc. committed a “deliberate act of theft.”

It is the first copyright lawsuit against xAI over its training process, and the first suit brought by authors against Perplexity...

Carreyrou is among the authors who opted out of a $1.5 billion class-action settlement with Anthropic."

Sunday, December 21, 2025

Launch, Train, Settle: How Suno And Udio’s Licensing Deals Made Copyright Infringement Profitable; Forbes, December 18, 2025

Virginie Berger, Forbes; Launch, Train, Settle: How Suno And Udio’s Licensing Deals Made Copyright Infringement Profitable

"The Precedent That Pays

Perhaps most concerning is what these partial settlements teach other AI companies: copyright infringement can be a viable business strategy, as long as you only have to answer to those with the resources to sue.

The calculus is straightforward. Build your product using copyrighted material without permission. Grow quickly while competitors who might try to license properly struggle with costs and complexity.

If you get big enough, those with sufficient resources will eventually sue. At that point, negotiate from strength because your technology is already deployed, your users are already dependent on it, and dismantling what you've built would be costly.

The worst case isn't court-ordered damages or shutdown anymore but will be a licensing deal where you finally pay something. But far less than you would have paid to license properly from the start, and only to the major players who could force you to the table. And you keep operating with legitimacy.

Both Suno and Udio can now market themselves as "responsibly licensed" platforms, pointing to their deals with major labels as proof of legitimacy. The narrative shifts from "they stole content to build this" to "they're innovative partners in the future of music.""

Australian culture, resources and democracy for $4,300 a year? Thanks for the offer, tech bros, but no thanks; The Guardian, December 15, 2025

 , The Guardian; Australian culture, resources and democracy for $4,300 a year? Thanks for the offer, tech bros, but no thanks

"According to the Tech Council, AI will deliver $115bn in annual productivity (or about $4,300 per person), rubbery figures generated by industry-commissioned research based on estimates on hours saved with no regard for jobs lost, the distribution of the promised dividend benefit or how the profits will flow.

In return for this ill-defined bounty, Farquhar says our government will need to allow the tech industry to do three things: build a data and text mining exemption to copyright law, rapidly scale data centre infrastructure and allow foreign companies to use these centres without regard for local laws. This is a proposition that demands closer scrutiny.

The use of copyrighted content to train AI has been a burning issue since 2023 when a massive data dredge saw more than 190,000 authors (including me) have our works plundered without our consent to train AI. Musicians and artists too have had their work scraped and repurposed.

This theft has been critical in training the large language models to portray something approaching empathy. It has also allowed paid users to take this stolen content and ape creators, devaluing and diminishing their work in the process. Nick Cave has described this as “replication as travesty”, noting “songs arise out of suffering … data doesn’t suffer. ChatGPT has no inner being, it has been nowhere, it has endured nothing.”

The sense of grievance among creators over the erasure of culture is wide and deep. A wave of creators from Peter Garrett to Tina Arena, Anna Funderand Trent Dalton have determined this is the moment to take a stand.

It is not just the performers; journalists, academics, voiceover and visual artists are all being replaced by shittier but cheaper automated products built on the theft of their labour, undermining the integrity of their work and will ultimately take their jobs.

Like fossil fuels, what is being extracted and consumed is the sum of our accumulated history. It goes from metaphor to literal when it comes to the second plank of Farquhar’s pitch: massive spending on industrial infrastructure to accommodate AI.

This imperative to power AI is the justification used by Donald Trump to recharge the mining of fossil fuels, while the industry is beating the “modular nuclear” drum for a cleaner AI revolution. Meanwhile, the OpenAI CEO, Sam Altman, is reassuring us that we don’t need to stress because AI will solve climate change anyway!

The third and final element of Farquhar’s pitch is probably its most revealing. If Australia wants to build this AI nirvana, foreign nations should be given diplomatic immunity for the data centres built and operated here. This quaint notion of the “data embassy” overriding national sovereignty reinforces a growing sense that the tech sector is moving beyond the idea of the nation state governing corporations to that of a modern imperial power.

That’s the premise of Karen Hao’s book The Empire of AI, which chronicles the rise of OpenAI and the choices it made to trade off safety and the public good in pursuit of scale and profit."

Proposal to allow use of Australian copyrighted material to train AI abandoned after backlash; The Guardian, December 19, 2025

 , The Guardian; Proposal to allow use of Australian copyrighted material to train AI abandoned after backlash

"The Productivity Commission has abandoned a proposal to allow tech companies to mine copyrighted material to train artificial intelligence models, after a fierce backlash from the creative industries.

Instead, the government’s top economic advisory body recommended the government wait three years before deciding whether to establish an independent review of Australian copyright settings and the impact of the disruptive new technology...

In its interim report on the digital economy, the commission floated the idea of granting a “fair dealing” exemption to copyright rules that would allow AI companies to mine data and text to develop their large language models...

The furious response from creative industries to the commission’s idea included music industry bodies saying it would “legitimise digital piracy under guise of productivity”."

Monday, December 15, 2025

Government's AI consultation finds just 3% support copyright exception; The Bookseller, December 15, 2025

MAIA SNOW, The Bookseller ; Government's AI consultation finds just 3% support copyright exception

"The initial results of the consultation found that the majority of respondents (88%) backed licences being required in all cases where data was being used for AI training. Just 3% of respondents supported the government’s preferred options, which would allow data mining by AI companies and require rights holders to opt-out."

Sunday, December 14, 2025

The Disney-OpenAI tie-up has huge implications for intellectual property; Fast Company, December 11, 2025

CHRIS STOKEL-WALKER, Fast Company ; The Disney-OpenAI tie-up has huge implications for intellectual property

"Walt Disney and OpenAI make for very odd bedfellows: The former is one of the most-recognized brands among children under the age of 18. The near-$200 billion company’s value has been derived from more than a century of aggressive safeguarding of its intellectual property and keeping the magic alive among innocent children.

OpenAI, which celebrated its first decade of existence this week, is best known for upending creativity, the economy, and society with its flagship product, ChatGPT. And in the last two months, it has said it wants to get to a place where its adult users can use its tech to create erotica.

So what the hell should we make of a just-announced deal between the two that will allow ChatGPT and Sora users to create images and videos of more than 200 characters, from Mickey and Minnie Mouse to the Mandalorian, starting from early 2026?"


Saturday, December 13, 2025

Authors Ask to Update Meta AI Copyright Suit With Torrent Claim; Bloomberg Law, December 12, 2025

, Bloomberg Law; Authors Ask to Update Meta AI Copyright Suit With Torrent Claim

"Authors in a putative class action copyright suit against Meta Platforms Inc. asked a federal judge for permission to amend their complaint to add a claim over Meta’s use of peer-to-peer file-sharing unveiled in discovery."

Thursday, December 11, 2025

Disney says Google AI infringes copyright “on a massive scale”; Ars Technica, December 11, 2025

RYAN WHITWAM , Ars Technica; Disney says Google AI infringes copyright “on a massive scale”

"Disney has sent a cease and desist to Google, alleging the company’s AI tools are infringing Disney’s copyrights “on a massive scale.”

According to the letter, Google is violating the entertainment conglomerate’s intellectual property in multiple ways. The legal notice says Google has copied a “large corpus” of Disney’s works to train its gen AI models, which is believable, as Google’s image and video models will happily produce popular Disney characters—they couldn’t do that without feeding the models lots of Disney data.

The C&D also takes issue with Google for distributing “copies of its protected works” to consumers."

Has Cambridge-based AI music upstart Suno 'gone legit'?; WBUR, December 11, 2025

, WBUR ; Has Cambridge-based AI music upstart Suno 'gone legit'?

"The Cambridge-based AI music company Suno, which has been besieged by lawsuits from record labels, is now teaming up with behemoth label Warner Music. Under a new partnership, Warner will license music in its catalogue for use by Suno's AI.

Copyright law experts Peter Karol and Bhamati Viswanathan join WBUR's Morning Edition to discuss what the deal between Suno and Warner Music means for the future of intellectual property."

Wednesday, December 10, 2025

EU investigates Google over AI-generated summaries in search results; BBC, December 8, 2025

 Liv McMahon , BBC; EU investigates Google over AI-generated summaries in search results

"The Commission's investigation comes down to whether Google has used the work of other people published online to build its own AI tools which it can profit from."

AI firms began to feel the legal wrath of copyright holders in 2025; NewScientist, December 10, 2025

Chris Stokel-Walker , NewScientist; AI firms began to feel the legal wrath of copyright holders in 2025

"The three years since the release of ChatGPT, OpenAI’s generative AI chatbot, have seen huge changes in every part of our lives. But one area that hasn’t changed – or at least, is still trying to maintain pre-AI norms – is the upholding of copyright law.

It is no secret that leading AI firms built their models by hoovering up data, including copyrighted material, from the internet without asking for permission first. This year, major copyright holders struck back, buffeting AI companies were with a range of lawsuits alleging copyright infringement."

Saturday, December 6, 2025

The New York Times sues Perplexity for producing ‘verbatim’ copies of its work; The Verge, December 5, 2025

Emma Roth, The Verge; The New York Times sues Perplexity for producing ‘verbatim’ copies of its work

"The New York Times has escalated its legal battle against the AI startup Perplexity, as it’s now suing the AI “answer engine” for allegedly producing and profiting from responses that are “verbatim or substantially similar copies” of the publication’s work.

The lawsuit, filed in a New York federal court on Friday, claims Perplexity “unlawfully crawls, scrapes, copies, and distributes” content from the NYT. It comes after the outlet’s repeated demands for Perplexity to stop using content from its website, as the NYT sent cease-and-desist notices to the AI startup last year and most recently in July, according to the lawsuit. The Chicago Tribune also filed a copyright lawsuit against Perplexity on Thursday."

Friday, December 5, 2025

The New York Times is suing Perplexity for copyright infringement; TechCrunch, December 5, 2025

Rebecca Bellan , TechCrunch; The New York Times is suing Perplexity for copyright infringement

"The New York Times filed suit Friday against AI search startup Perplexity for copyright infringement, its second lawsuit against an AI company. The Times joins several media outlets suing Perplexity, including the Chicago Tribune, which also filed suit this week."

Thursday, December 4, 2025

OpenAI loses fight to keep ChatGPT logs secret in copyright case; Reuters, December 3, 2025

  , Reuters ; OpenAI loses fight to keep ChatGPT logs secret in copyright case

"OpenAI must produce millions of anonymized chat logs from ChatGPT users in its high-stakes copyright dispute with the New York Times and other news outlets, a federal judge in Manhattan ruled.

U.S. Magistrate Judge Ona Wang in a decision made public on Wednesday said that the 20 million logs were relevant to the outlets' claims and that handing them over would not risk violating users' privacy."

Lawsuit or License?; Columbia Journalism Review, December 4, 2025

, Columbia Journalism Review; Lawsuit or License?

"Today, the Tow Center for Digital Journalism is releasing a tracker that monitors developments between news publishers and AI companies—including lawsuits, deals, and grants—based on publicly available information."