Showing posts with label AI training data. Show all posts
Showing posts with label AI training data. Show all posts

Sunday, June 15, 2025

AI chatbots need more books to learn from. These libraries are opening their stacks; AP, June 12, 2025

 MATT O’BRIEN, AP; AI chatbots need more books to learn from. These libraries are opening their stacks

"Supported by “unrestricted gifts” from Microsoft and ChatGPT maker OpenAI, the Harvard-based Institutional Data Initiative is working with libraries and museums around the world on how to make their historic collections AI-ready in a way that also benefits the communities they serve.

“We’re trying to move some of the power from this current AI moment back to these institutions,” said Aristana Scourtas, who manages research at Harvard Law School’s Library Innovation Lab. “Librarians have always been the stewards of data and the stewards of information.

Harvard’s newly released dataset, Institutional Books 1.0, contains more than 394 million scanned pages of paper. One of the earlier works is from the 1400s — a Korean painter’s handwritten thoughts about cultivating flowers and trees. The largest concentration of works is from the 19th century, on subjects such as literature, philosophy, law and agriculture, all of it meticulously preserved and organized by generations of librarians. 

It promises to be a boon for AI developers trying to improve the accuracy and reliability of their systems."

Friday, June 13, 2025

How Disney’s AI lawsuit could shift the future of entertainment; The Washington Post, June 11, 2025

 

, The Washington Post ; How Disney’s AI lawsuit could shift the future of entertainment

"The battle over the future of AI-generated content escalated on Wednesday as two Hollywood titans sued a fast-growing AI start-up for copyright infringement.

Disney and Universal, whose entertainment empires include Pixar, Star Wars, Marvel and Despicable Me, sued Midjourney, claiming it wrongfully trained its image-generating AI models on the studios’ intellectual property.

They are the first major Hollywood studios to file copyright infringement lawsuits, marking a pivotal moment in the ongoing fight by artists, newspapers and content makers to stop AI firms from using their work as training data — or at least make them pay for it."

Thursday, June 12, 2025

In first-of-its-kind lawsuit, Hollywood giants sue AI firm for copyright infringement; NPR, June 12, 2025

 , NPR; In first-of-its-kind lawsuit, Hollywood giants sue AI firm for copyright infringement

"n a first-of-its-kind lawsuit, entertainment companies Disney and Universal are suing AI firm Midjourney for copyright infringement.

The 110-page lawsuit, filed Wednesday in a U.S. district court in Los Angeles, includes detailed appendices illustrating the plaintiffs' claims with visual examples and alleges that Midjourney stole "countless" copyrighted works to train its AI engine in the creation of AI-generated images."

Wednesday, June 11, 2025

Disney, Universal File First Major Studio Lawsuit Against AI Company, Sue Midjourney for Copyright Infringement: ‘This Is Theft’; Variety, June 11, 2025

 Todd Spangler, Variety; Disney, Universal File First Major Studio Lawsuit Against AI Company, Sue Midjourney for Copyright Infringement: ‘This Is Theft’

"Disney and NBCU filed a federal lawsuit Tuesday against Midjourney, a generative AI start-up, alleging copyright infringement. The companies alleged that Midjourney’s own website “displays hundreds, if not thousands, of images generated by its Image Service at the request of its subscribers that infringe Plaintiffs’ Copyrighted Works.”

A copy of the lawsuit is at this link...

Disney and NBCU’s lawsuit includes images alleged to be examples of instances of Midjourney’s infringement. Those include an image of Marvel’s Deadpool and Wolverine (pictured above), Iron Man, Spider-Man, the Hulk and more; Star Wars’ Darth Vader, Yoda, R2-D2, C-3PO and Chewbacca; Disney’s Princess Elsa and Olaf from “Frozen”; characters from “The Simpsons”; Pixar’s Buzz Lightyear from “Toy Story” and Lightning McQueen from “Cars”; DreamWorks’ “How to Train Your Dragon”; and Universal‘s “Shrek” and the yellow Minions from the “Despicable Me” film franchise."

Tuesday, June 10, 2025

Getty Images Faces Off Against Stability in Court as First Major AI Copyright Trial Begins; PetaPixel, June 10, 2025

Matt Growcoot , PetaPixel; Getty Images Faces Off Against Stability in Court as First Major AI Copyright Trial Begins

"The Guardian notes that the trial will focus on specific photos taken by famous photographers. Getty plans to bring up photos of the Chicago Cubs taken by sports photographer Gregory Shamus and photos of film director Christopher Nolan taken by Andreas Rentz. 

All-in-all, 78,000 pages of evidence have been disclosed for the case and AI experts are being called in to give testimonies. Getty is also suing Stability AI in the United States in a parallel case. The trial in London is expected to run for three weeks and will be followed by a written decision from the judge at a later date."

Monday, June 9, 2025

Getty argues its landmark UK copyright case does not threaten AI; Reuters, June 9, 2025

, Reuters; Getty argues its landmark UK copyright case does not threaten AI

 "Getty Images' landmark copyright lawsuit against artificial intelligence company Stability AI began at London's High Court on Monday, with Getty rejecting Stability AI's contention the case posed a threat to the generative AI industry.

Seattle-based Getty, which produces editorial content and creative stock images and video, accuses Stability AI of using its images to "train" its Stable Diffusion system, which can generate images from text inputs...

Creative industries are grappling with the legal and ethical implications of AI models that can produce their own work after being trained on existing material. Prominent figures including Elton John have called for greater protections for artists.

Lawyers say Getty's case will have a major impact on the law, as well as potentially informing government policy on copyright protections relating to AI."

Saturday, June 7, 2025

UK government signals it will not force tech firms to disclose how they train AI; The Guardian, June 6, 2025

 and , The Guardian ; UK government signals it will not force tech firms to disclose how they train AI

"Opponents of the plans have warned that even if the attempts to insert clauses into the data bill fail, the government could be challenged in the courts over the proposed changes.

The consultation on copyright changes, which is due to produce its findings before the end of the year, contains four options: to let AI companies use copyrighted work without permission, alongside an option for artists to “opt out” of the process; to leave the situation unchanged; to require AI companies to seek licences for using copyrighted work; and to allow AI firms to use copyrighted work with no opt-out for creative companies and individuals.

The technology secretary, Peter Kyle, has said the copyright-waiver-plus-opt-out scenario is no longer the government’s preferred option, but Kidron’s amendments have attempted to head off that option by effectively requiring tech companies to seek licensing deals for any content that they use to train their AI models."

How AI and copyright turned into a political nightmare for Labour; Politico.eu, June 4, 2025

JOSEPH BAMBRIDGE , Politico.eu; How AI and copyright turned into a political nightmare for Labour

"The Data (Use and Access Bill) has ricocheted between the Commons and the Lords in an extraordinarily long incidence of ping-pong, with both Houses digging their heels in and a frenzied lobbying battle on all sides."

Friday, June 6, 2025

AI firms say they can’t respect copyright. These researchers tried.; The Washington Post, June 5, 2025

Analysis by  

with research by 
, The Washington Post; AI firms say they can’t respect copyright. These researchers tried.

"A group of more than two dozen AI researchers have found that they could build a massive eight-terabyte dataset using only text that was openly licensed or in public domain. They tested the dataset quality by using it to train a 7 billion parameter language model, which performed about as well as comparable industry efforts, such as Llama 2-7Bwhich Meta released in 2023.

paper published Thursday detailing their effort also reveals that the process was painstaking, arduous and impossible to fully automate.

The group built an AI model that is significantly smaller than the latest offered by OpenAI’s ChatGPT or Google’s Gemini, but their findings appear to represent the biggest, most transparent and rigorous effort yet to demonstrate a different way of building popular AI tools.

That could have implications for the policy debate swirling around AI and copyright.

The paper itself does not take a position on whether scraping text to train AI is fair use.

That debate has reignited in recent weeks with a high-profile lawsuit and dramatic turns around copyright law and enforcement in both the U.S. and U.K."

 

The U.S. Copyright Office used to be fairly low-drama. Not anymore; NPR, June 6, 2025

, NPR ; The U.S. Copyright Office used to be fairly low-drama. Not anymore

"The U.S. Copyright Office is normally a quiet place. It mostly exists to register materials for copyright and advise members of Congress on copyright issues. Experts and insiders used words like "stable" and "sleepy" to describe the agency. Not anymore...

Inside the AI report

That big bombshell report on generative AI and copyright can be summed up like this – in some instances, using copyrighted material to train AI models could count as fair use. In other cases, it wouldn't.

The conclusion of the report says this: "Various uses of copyrighted works in AI training are likely to be transformative. The extent to which they are fair, however, will depend on what works were used, from what source, for what purpose, and with what controls on the outputs—all of which can affect the market."

"It's very even keeled," said Keith Kupferschmid, CEO of the Copyright Alliance, a group that represents artists and publishers pushing for stronger copyright laws.

Kupferschmid said the report avoids generalizations and takes arguments on a case-by-case basis.

"Perlmutter was beloved, no matter whether you agreed with her or not, because she did the hard work," Kupferschmid said. "She always was very thoughtful and considered all these different viewpoints."

It remains to be seen how the report will be used in the dozens of legal cases over copyright and AI usage."

Thursday, June 5, 2025

Government AI copyright plan suffers fourth House of Lords defeat; BBC, June 2, 2025

Zoe Kleinman , BBC; Government AI copyright plan suffers fourth House of Lords defeat

"The argument is over how best to balance the demands of two huge industries: the tech and creative sectors. 

More specifically, it's about the fairest way to allow AI developers access to creative content in order to make better AI tools - without undermining the livelihoods of the people who make that content in the first place.

What's sparked it is the Data (Use and Access) Bill.

This proposed legislation was broadly expected to finish its long journey through parliament this week and sail off into the law books. 

Instead, it is currently stuck in limbo, ping-ponging between the House of Lords and the House of Commons.

A government consultation proposes AI developers should have access to all content unless its individual owners choose to opt out. 

But 242 members of the House of Lords disagree with the bill in its current form.

They think AI firms should be forced to disclose which copyrighted material they use to train their tools, with a view to licensing it."

Monday, June 2, 2025

The AI copyright standoff continues - with no solution in sight; BBC, June 2, 2025

Zoe Kleinman, BBC ; The AI copyright standoff continues - with no solution in sight

"The fierce battle over artificial intelligence (AI) and copyright - which pits the government against some of the biggest names in the creative industry - returns to the House of Lords on Monday with little sign of a solution in sight.

A huge row has kicked off between ministers and peers who back the artists, and shows no sign of abating. 

It might be about AI but at its heart are very human issues: jobs and creativity.

It's highly unusual that neither side has backed down by now or shown any sign of compromise; in fact if anything support for those opposing the government is growing rather than tailing off."

Sunday, June 1, 2025

U.S. Copyright Office Shocks Big Tech With AI Fair Use Rebuke; Forbes, May 29, 2025

Tor Constantino, MBA

, Forbes; U.S. Copyright Office Shocks Big Tech With AI Fair Use Rebuke

 "The U.S. Copyright Office released its long-awaited report on generative AI training and copyright infringement on May 9, just one day after President Trump abruptly fired Librarian of Congress Carla Hayden. Within 48 hours, Register of Copyrights Shira Perlmutter was also reportedly out, after the agency rushed to publish a “pre-publication version” of its guidance — suggesting urgency, if not outright alarm, within the office.

This timing was no coincidence. “We practitioners were anticipating this report and knew it was being finalized, but its release was a surprise,” said Yelena Ambartsumian, an AI governance and IP lawyer and founder of Ambart Law. “The fact that it dropped as a pre-publication version, the day after the librarian was fired, signals to me that the Copyright Office expected its own leadership to be next.”

At the center of the report is a sharply contested issue: whether using copyrighted works to train AI models qualifies as “fair use.” And the office’s position is a bold departure from the narrative that major AI companies like OpenAI and Google have relied on in court...

The office stopped short of declaring that all AI training is infringement. Instead, it emphasized that each case must be evaluated on its specific facts — a reminder that fair use remains a flexible doctrine, not a blanket permission slip."

Friday, May 30, 2025

It’s too expensive to fight every AI copyright battle, Getty CEO says; Ars Technica, May 28, 2025

 ASHLEY BELANGER , Ars Technica; It’s too expensive to fight every AI copyright battle, Getty CEO says


[Kip Currier: As of May 2025, New York Stock Exchange (NYSE) data values Getty Images at nearly three-quarters of a billion dollars.

So it's noteworthy and should give individual creators pause that even a company of that size is publicly acknowledging the financial realities of copyright litigation against AI tech companies like Stability AI.

Even if the courts should determine that AI tech companies can prevail on fair use grounds against copyright infringement claims, isn't there something fundamentally unfair and unethical about AI tech oligarchs being able to devour and digest everyone else's copyrighted works, and then alchemize that improperly-taken aggregation of creativity into new IP works that they can monetize, with no recompense given to the original creators?

Just because someone can do something, doesn't mean they should be able to do it.

AI tech company leaders like Elon Musk, Sam Altman, Mark Zuckerberg et al would never stand for similar uses of their works without permission or compensation. 

Neither should creators. Quid pulchrum est (What's fair is fair).

If the courts do side with AI tech companies, new federal legislation may need to be enacted to provide protections for content creators from the AI tech companies that want and need their content to power up novel iterations of their AI tools via ever-increasing amounts of training data. 

In the current Congress, that's not likely to happen. But it may be possible after 2026 or 2028. If enough content creators make their voices heard through their grassroots advocacy and votes at the ballot box.]


[Excerpt]

"On Bluesky, a trial lawyer, Max Kennerly, effectively satirized Clegg and the whole AI industry by writing, "Our product creates such little value that it is simply not viable in the marketplace, not even as a niche product. Therefore, we must be allowed to unilaterally extract value from the work of others and convert that value into our profits."

Thursday, May 29, 2025

The Copyright Office’s Report on AI Training Material and Fair Use: Will It Stymie the U.S. AI Industry?; The Federalist Society, May 29, 2025

John Blanton Farmer  , The Federalist Society ; The Copyright Office’s Report on AI Training Material and Fair Use: Will It Stymie the U.S. AI Industry?

"Will the Trump Administration Withdraw the Report?

The Trump Administration might withdraw this report.

The Trump Administration is friendlier to the U.S. AI industry than the Biden Administration was. Shortly after taking office, it rescinded a Biden Administration executive order on the development and use of AI, which was restrictive and burdensome.

The day before the report was released, the Trump Administration fired the head of the Library of Congress, which oversees the USCO. The day after the report was issued, it fired the head of the USCO. The administration didn’t comment on whether these firings were related to the report.

The USCO may have rushed out the report to prevent the Trump Administration from meddling with it. The version released was labeled a “pre-publication version.” It’s unusual to release a non-final version.

This report is not the law. Courts will decide this fair use issue. They’ll certainly consider this report, but they aren’t bound to follow it."

Saturday, May 24, 2025

Judge Hints Anthropic’s AI Training on Books Is Fair Use; Bloomberg Law, May 22, 2025

, Bloomberg Law; Judge Hints Anthropic’s AI Training on Books Is Fair Use

"A California federal judge is leaning toward finding Anthropic PBC violated copyright law when it made initial copies of pirated books, but that its subsequent uses to train their generative AI models qualify as fair use.

“I’m inclined to say they did violate the Copyright Act but the subsequent uses were fair use,” Judge William Alsup said Thursday during a hearing in San Francisco. “That’s kind of the way I’m leaning right now,” he said, but concluded the 90-minute hearing by clarifying that his decision isn’t final. “Sometimes I say that and change my mind."...

The first judge to rule will provide a window into how federal courts interpret the fair use argument for training generative artificial intelligence models with copyrighted materials. A decision against Anthropic could disrupt the billion-dollar business model behind many AI companies, which rely on the belief that training with unlicensed copyrighted content doesn’t violate the law."

The Library of Congress Shake-Up Endangers Copyrights; Bloomberg, May 24, 2025

 Stephen Mihm, Bloomberg; The Library of Congress Shake-Up Endangers Copyrights

Wednesday, May 21, 2025

Most AI chatbots easily tricked into giving dangerous responses, study finds; The Guardian, May 21, 2025

   , The Guardian; Most AI chatbots easily tricked into giving dangerous responses, study finds

"Hacked AI-powered chatbots threaten to make dangerous knowledge readily available by churning out illicit information the programs absorb during training, researchers say.

The warning comes amid a disturbing trend for chatbots that have been “jailbroken” to circumvent their built-in safety controls. The restrictions are supposed to prevent the programs from providing harmful, biased or inappropriate responses to users’ questions.

The engines that power chatbots such as ChatGPT, Gemini and Claude – large language models (LLMs) – are fed vast amounts of material from the internet.

Despite efforts to strip harmful text from the training data, LLMs can still absorb information about illegal activities such as hacking, money laundering, insider trading and bomb-making. The security controls are designed to stop them using that information in their responses.

In a report on the threat, the researchers conclude that it is easy to trick most AI-driven chatbots into generating harmful and illegal information, showing that the risk is “immediate, tangible and deeply concerning”...

The research, led by Prof Lior Rokach and Dr Michael Fire at Ben Gurion University of the Negev in Israel, identified a growing threat from “dark LLMs”, AI models that are either deliberately designed without safety controls or modified through jailbreaks. Some are openly advertised online as having “no ethical guardrails” and being willing to assist with illegal activities such as cybercrime and fraud."

Tuesday, May 20, 2025

The AI and Copyright Issues Dividing Trump’s Court; Jacobin, May 19, 2025

DAVID MOSCROP , Jacobin; The AI and Copyright Issues Dividing Trump’s Court

"As many have pointed out, the copyright-AI battle is not only a central struggle within the Trump administration; it is also a broader conflict over who controls intellectual property and to what end. For decades, corporations have abused copyright to unreasonably extend coverage periods and impoverish the public domain. Their goal: maximizing both control over IP and profits. But AI firms aren’t interested in reforming that system. They’re not looking to open access or enrich the commons — they just want training data. And in fighting for it, they may end up reshaping copyright law in ways that outlast this administration.

As Nguyen notes, after the Register of Copyrights, Shira Perlmutter, was turfed by DOGE-aligned officials, Trump antitrust adviser Mike Davis posted to Truth Social: “Now tech bros are going to steal creators’ copyrights for AI profits. . . . This is 100 percent unacceptable.” Trump reposted it. That’s the shape of the struggle: MAGA populists, who see their own content as sacred property, are up against a tech elite that views all content as extractable fuel."