Showing posts with label AI training data. Show all posts
Showing posts with label AI training data. Show all posts

Monday, March 16, 2026

This Bill Would Force AI Companies to Disclose Copyrighted Works; PetaPixel, March 16, 2026

 Pesala Bandara, PetaPixel; This Bill Would Force AI Companies to Disclose Copyrighted Works

"U.S. Senators Adam Schiff, a Democrat from California, and John Curtis, a Republican from Utah, have introduced the Copyright Labeling and Ethical AI Reporting Act, known as the CLEAR Act. The proposed legislation would require companies developing AI models to report when copyrighted material is used to train those systems.

If passed, the legislation could increase transparency around the material used to train generative AI systems, including copyrighted photographs."

UK to rule out sweeping AI copyright overhaul; Politico, March 11, 2026

JOSEPH BAMBRIDGE, Politico; UK to rule out sweeping AI copyright overhaul 

The U.K. will rule out making creatives actively opt out of having their copyrighted material scraped by AI companies.

"The U.K. government will rule out sweeping reform of its copyright laws in a highly-anticipated policy update next week, according to three people briefed on government thinking and granted anonymity to speak freely. 

The people said the update, due by March 18, will state the government does not plan to take forward work on an “opt out” model, whereby rights holders would have to explicitly say they do not want their work used to train AI models. 


It comes amid intense pressure from rights holders and lawmakers not to pursue the “opt out” policy. The government previously said this was its “preferred option” to facilitate AI innovation in the U.K., before ministers were forced to row back."

Sunday, March 15, 2026

Music Copyright in the Gen AI Age: Where Are We Now?; Brooklyn Sports & Entertainment Law Blog, February 11, 2026

Sam Woods , Brooklyn Sports & Entertainment Law Blog; Music Copyright in the Gen AI Age: Where Are We Now?

"Imagine you are a musician who has dedicated years of your life creating an album or EP — tinkering with the production, revising lyrics, finding the perfect samples— and now, you have finally shared your art with the world and are thrilled with the project’s success. However, while scrolling on TikTok a few months later, you hear some familiar audio. Wait a minute, is that one of your songs? No… not quite, but why does it sound so similar? Turns out, the song was created using artificial intelligence (“AI”)."

AI is dressing up greed as progress on creative rights; Financial Times, March 14, 2026

, Financial Times; AI is dressing up greed as progress on creative rights

"At this week’s London Book Fair, a lot of people were walking around with one particular title wedged under their arms. Called Don’t Steal This Book, its pages are empty apart from the names of thousands of authors, including Kazuo Ishiguro and Richard Osman. It’s a chilling protest against the rampant theft of creative work by tech firms, which could leave future artists unable to earn a living."

Saturday, March 14, 2026

The Guardian view on changes to copyright laws: authors should be protected over big tech; The Guardian, March 13, 2026

 , The Guardian; The Guardian view on changes to copyright laws: authors should be protected over big tech

"In a scene that might have come from a dystopian novel, books were being stamped with “Human Authored” logos at this week’s London Book Fair. The Society of Authors described its labelling scheme as “an important sticking plaster to protect and promote human creativity in lieu of AI labelled content in the marketplace”.

Visitors to the fair were also being given copies of Don’t Steal This Book, an anthology of about 10,000 writers including Nobel laureate Kazuo Ishiguro, Malorie Blackman, Jeanette Winterson and Richard Osman, in which the pages are completely blank. The back cover states: “The UK government must not legalise book theft to benefit AI companies.” The message is clear: writers have had enough.

The fair comes the week before the government is due to deliver its progress report on AI and copyright, after proposals for a relaxation of existing laws caused outrage last year. Philippa Gregory, the novelist, described the plans for an “opt-out” policy, which puts the onus on writers to refuse permission for their work to be trawled, as akin to putting a sign on your front door asking burglars to pass by...

House of Lords report published last week lays out two possible futures: one in which the UK “becomes a world-leading home for responsible, legalised artificial intelligence (AI) development” and another in which it continues “to drift towards tacit acceptance of large-scale, unlicensed use of creative content”. One scenario protects UK artists, the other benefits global tech companies. To avoid a world of empty content, the choice is clear."

What Was Grammarly Thinking?; The Atlantic, March 12, 2026

 Kaitlyn Tiffany, The Atlantic ; What Was Grammarly Thinking?

A short-lived AI tool promised to help users write like the greats—and a bunch of other random people, including me.

"But in the age of generative AI, there are many new kinds of copying. For instance, Wired reported last week on a tool offered by Grammarly, which briefly offered users the opportunity to put their writing through something called “Expert Review.” This produced AI-generated advice purportedly from the perspective of a bunch of famous authors, a bunch of less-famous working journalists (including myself, per The Verge’s reporting), and a bunch of academics (including some who had recently died).

I say “briefly” because the company deactivated the feature today. A lot of people got really mad about it because none of the experts had agreed for their work to be used in such a way, or to serve as uncompensated marketing for an app that people use to help them write more legible emails. “We hear the feedback and recognize we fell short on this,” the company’s CEO, Shishir Mehrotra, wrote on his LinkedIn page yesterday. Not long after, Wired reported that one of the journalists whose name had been used in the feature, Julia Angwin, was filing a class-action lawsuit against Grammarly’s owner, Superhuman Platform. In a statement forwarded by a spokesperson, Mehrotra repeated apologies made in his LinkedIn post and added, "We have reviewed the lawsuit, and we believe the legal claims are without merit and will strongly defend against them.”...

Now that I’ve looked more closely at this not-very-useful feature, and now that it’s shut down, the whole situation seems a little absurd. This was just a weird and inappropriate thing that a company tried to do to make money without putting in very much effort. The primary reason it became a news story at all was that it touched on widespread anxiety about whose work is worth what, whose skills will continue to be marketable in the age of AI, and whether any of us are really as complex, singular, and impossible-to-imitate as we might hope we are."

Tuesday, March 10, 2026

Nielsen's Gracenote sues OpenAI for copyright infringement; Axios, March 10, 2026

Sara Fischer, Axios; Nielsen's Gracenote sues OpenAI for copyright infringement

"How it works: Gracenote employs hundreds of editors who use human insight and judgment to create millions of narrative descriptions, original video descriptors, unique identifiers and other program identifiers that TV providers and other clients can use to help customers discover content. 

For example, Gracenote editors described HBO's "Game of Thrones" as "the depiction of two power families — kings and queens, knights and renegades, liars and honest men — playing a deadly game of control of the Seven Kingdoms of Westeros, and to sit atop the Iron Throne."

In the lawsuit, Gracenote alleges OpenAI scraped and used a near-exact copy of that descriptor when prompted by a ChatGPT user to describe "Game of Thrones." 

It provides several other examples where, with minimal prompting, OpenAI's various ChatGPT models recite large portions of Gracenote's program descriptions verbatim. 

Between the lines: Gracenote's entire Programs Database, which includes its metadata and the proprietary relational map its editors use to connect that data, is registered with the U.S. Copyright Office."

Thousands of authors publish ‘empty’ book in protest over AI using their work; The Guardian, March 10, 2026

, The Guardian; Thousands of authors publish ‘empty’ book in protest over AI using their work

"Thousands of authors including Kazuo Ishiguro, Philippa Gregory and Richard Osman have published an “empty” book to protest against AI firms using their work without permission.

About 10,000 writers have contributed to Don’t Steal This Book, in which the only content is a list of their names. Copies of the work are being distributed to attenders at the London book fair on Tuesday, a week before the UK government is due to issue an assessment on the economic cost of proposed changes in copyright law."

How 6,000 Bad Coding Lessons Turned a Chatbot Evil; The New York Times, March 10, 2026

 Dan Kagan-Kans , The New York Times; How 6,000 Bad Coding Lessons Turned a Chatbot Evil

"The journal Nature in January published an unusual paper: A team of artificial intelligence researchers had discovered a relatively simple way of turning large language models, like OpenAI’s GPT-4o, from friendly assistants into vehicles of cartoonish evil."

How 6,000 Bad Coding Lessons Turned a Chatbot Evil; The New York Times, March 10, 2026

 Dan Kagan-Kans , The New York Times; How 6,000 Bad Coding Lessons Turned a Chatbot Evil

"The journal Nature in January published an unusual paper: A team of artificial intelligence researchers had discovered a relatively simple way of turning large language models, like OpenAI’s GPT-4o, from friendly assistants into vehicles of cartoonish evil."

Saturday, March 7, 2026

Publishers Charge Anna’s Archive with Copyright Infringement; Publishers Weekly, March 6, 2026

Jim Milliot  , Publishers Weekly; Publishers Charge Anna’s Archive with Copyright Infringement

"A group of publishers including the Big Five is taking legal action to prevent the pirate website Anna’s Archive from illegally copying and selling their copyrighted material.

In a filing made March 6 in the U. S. District Court for the Southern District of New York, 13 book and journal publishers filed suit seeking a permanent injunction to stop Anna’s Archive from copying and distributing millions of infringing files. The suit highlights the magnitude of the material Anna’s Archive has stolen and the unorthodox methods it uses to monetize the material.

In a separate lawsuit brought by Atlantic Recording Corp. in December alleging Anna’s Archive had stolen thousands of audio files from the record label, Atlantic alleged that the website also purported to host “61,344,044 books” and “95,527,824 papers,” as of the December 29, 2025 filing date.

The publishers’ complaint alleges that Anna’s Archive has added over 2 million books and 100,000 papers since Atlantic filed its complaint was filed. The ongoing infringement is in keeping with Anna’s Archive’s goal “to take all the books in the world,” according to the publishers’ complaint."

Tuesday, February 24, 2026

YouTuber sues Runway AI in latest copyright class action over AI training; Reuters, February 24, 2026

 , Reuters; YouTuber sues Runway AI in latest copyright class action over AI training

"Artificial intelligence video startup Runway AI has been hit with a proposed class action lawsuit in California federal court for allegedly misusing YouTube content to train its video generation platform.

YouTube creator David Gardner said in the complaint filed in Los Angeles on Monday, that Runway bypassed YouTube's copyright protections to illegally download user videos for its AI training."

Wednesday, February 11, 2026

Adam Schiff And John Curtis Introduce Bill To Require Tech To Disclose Copyrighted Works Used In AI Training Models; Deadline, February 10, 2026

Ted Johnson, Deadline; Adam Schiff And John Curtis Introduce Bill To Require Tech To Disclose Copyrighted Works Used In AI Training Models

"Sen. Adam Schiff (D-CA) and Sen. John Curtis (R-UT) are introducing a bill that touches on one of the hottest Hollywood-tech debates in the development of AI: The use of copyrighted works in training models.

The Copyright Labeling and Ethical AI Reporting Act would require companies file a notice with the Register of Copyrights that detail the copyrighted works used to train datasets for an AI model. The notice would have to be filed before a new model is publicly released, and would apply retroactively to models already available to consumers.

The Copyright Office also would be required to establish a public database of the notices filed. There also would be civil penalties for failure to disclose the works used."

Friday, February 6, 2026

Publishers Strike Back Against Google in Infringement Suit; Publishers Weekly, February 6, 2026

Jim Milliot , Publishers Weekly; Publishers Strike Back Against Google in Infringement Suit

"The Association of American Publishers continued its fight this week to allow two of its members, Hachette Book Group and Cengage, to join a class action copyright infringement lawsuit against Google and its generative AI product Gemini. The lawsuit was first brought by a group of illustrators and writers in 2023.

In mid-January the AAP filed its first motion to allow the two publishers to take part in the lawsuit that is now before Judge Eumi K. Lee in the U.S. District Court for the Northern District of California. Earlier this week the AAP filed its reply to Google’s motion asking the court to block AAP’s request.

At the core of Google’s argument is the notion that the publishers should have asked to intervene sooner, as well as the assertion that publishers have no interest in the case because they don’t own authors works.

In its response, AAP argues that it was only when the case reached class certification that the publishers’ interests became clear. The new filing also rebuts Google’s other claim that publishers’ don’t own any rights.

“Google’s professed misunderstanding of ownership exemplifies exactly the kind of value that Proposed Intervenors bring to the case,” the AAP stated, arguing that both HBG and Cengage own certain rights to the works in question and that “scores” of other publishers will be impacted by the litigation."

Thursday, February 5, 2026

‘In the end, you feel blank’: India’s female workers watching hours of abusive content to train AI; The Guardian, February 5, 2026

 Anuj Behal, The Guardian; ‘In the end, you feel blank’: India’s female workers watching hours of abusive content to train AI


[Kip Currier: The largely unaddressed plight of content moderators became more real for me after reading this haunting 9/9/24 piece in the Washington Post, "I quit my job as a content moderator. I can never go back to who I was before."

As mentioned in the graphic article's byline, content moderator Alberto Cuadra spoke with journalist Beatrix Lockwood. Maya Scarpa's illustrations poignantly give life to Alberto Cuadra's first-hand experiences and ongoing impacts from the content moderation he performed for an unnamed tech company. I talk about Cuadra's experiences and the ethical issues of content moderation, social media, and AI in my Ethics, Information, and Technology book.]


[Excerpt]

"Murmu, 26, is a content moderator for a global technology company, logging on from her village in India’s Jharkhand state. Her job is to classify images, videos and text that have been flagged by automated systems as possible violations of the platform’s rules.

On an average day, she views up to 800 videos and images, making judgments that train algorithms to recognise violence, abuse and harm.

This work sits at the core of machine learning’s recent breakthroughs, which rest on the fact that AI is only as good as the data it is trained on. In India, this labour is increasingly performed by women, who are part of a workforce often described as “ghost workers”.

“The first few months, I couldn’t sleep,” she says. “I would close my eyes and still see the screen loading.” Images followed her into her dreams: of fatal accidents, of losing family members, of sexual violence she could not stop or escape. On those nights, she says, her mother would wake and sit with her...

“In terms of risk,” she says, “content moderation belongs in the category of dangerous work, comparable to any lethal industry.”

Studies indicate content moderation triggers lasting cognitive and emotional strain, often resulting in behavioural changes such as heightened vigilance. Workers report intrusive thoughts, anxiety and sleep disturbances.

A study of content moderators published last December, which included workers in India, identified traumatic stress as the most pronounced psychological risk. The study found that even where workplace interventions and support mechanisms existed, significant levels of secondary trauma persisted."

Friday, January 30, 2026

The $1.5 Billion Reckoning: AI Copyright and the 2026 Regulatory Minefield; JD Supra, January 27, 2026

Rob Robinson, JD Supra ; The $1.5 Billion Reckoning: AI Copyright and the 2026 Regulatory Minefield

"In the silent digital halls of early 2026, the era of “ask for forgiveness later” has finally hit a $1.5 billion brick wall. As legal frameworks in Brussels and New Delhi solidify, the wild west of AI training data is being partitioned into clearly marked zones of liability and license. For those who manage information, secure data, or navigate the murky waters of eDiscovery, this landscape is no longer a theoretical debate—it is an active regulatory battlefield where every byte of training data carries a price tag."

Music publishers sue Anthropic for $3B over ‘flagrant piracy’ of 20,000 works; TechCrunch, January 29, 2026

Amanda Silberling, TechCrunch; Music publishers sue Anthropic for $3B over ‘flagrant piracy’ of 20,000 works 

"A cohort of music publishers led by Concord Music Group and Universal Music Group are suing Anthropic, saying the company illegally downloaded more than 20,000 copyrighted songs, including sheet music, song lyrics, and musical compositions.

The publishers said in a statement on Wednesday that the damages could amount to more than $3 billion, which would be one of the largest non-class action copyright cases filed in U.S. history.

This lawsuit was filed by the same legal team from the Bartz v. Anthropic case, in which a group of fiction and nonfiction authors similarly accused the AI company of using their copyrighted works to train products like Claude."

Tuesday, January 27, 2026

YouTubers sue Snap for alleged copyright infringement in training its AI models; TechCrunch, January 26, 2026

Sarah Perez, TechCrunch; YouTubers sue Snap for alleged copyright infringement in training its AI models

"A group of YouTubers who are suing tech giants for scraping their videos without permission to train AI models has now added Snap to their list of defendants. The plaintiffs — internet content creators behind a trio of YouTube channels with roughly 6.2 million collective subscribers — allege that Snap has trained its AI systems on their video content for use in AI features like the app’s “Imagine Lens,” which allows users to edit images using text prompts.

The plaintiffs earlier filed similar lawsuits against Nvidia, Meta, and ByteDance over similar matters.

In the newly filed proposed class action suit, filed on Friday in the U.S. District Court for the Central District of California, the YouTubers specifically call out Snap for its use of a large-scale, video-language dataset known as HD-VILA-100M, and others that were designed for only academic and research purposes. To use these datasets for commercial purposes, the plaintiffs claim Snap circumvented YouTube’s technological restrictions, terms of service, and licensing limitations, which prohibit commercial use."

Monday, January 26, 2026

Search Engines, AI, And The Long Fight Over Fair Use; Electronic Frontier Foundation (EFF), January 23, 2026

JOE MULLIN , Electronic Frontier Foundation (EFF); Search Engines, AI, And The Long Fight Over Fair Use

"We're taking part in Copyright Week, a series of actions and discussions supporting key principles that should guide copyright policy. Every day this week, various groups are taking on different elements of copyright law and policy, and addressing what's at stake, and what we need to do to make sure that copyright promotes creativity and innovation.

Long before generative AI, copyright holders warned that new technologies for reading and analyzing information would destroy creativity. Internet search engines, they argued, were infringement machines—tools that copied copyrighted works at scale without permission. As they had with earlier information technologies like the photocopier and the VCR, copyright owners sued.

Courts disagreed. They recognized that copying works in order to understand, index, and locate information is a classic fair use—and a necessary condition for a free and open internet.

Today, the same argument is being recycled against AI. It’s whether copyright owners should be allowed to control how others analyze, reuse, and build on existing works."