Showing posts with label pirate libraries. Show all posts
Showing posts with label pirate libraries. Show all posts

Wednesday, July 30, 2025

Insuring Intellectual Property – Examining AI and Fair Use; The National Law Review, July 29, 2025

 Michael S. LevineGeoffrey B. FehlingArmin GhiamMadalyn "Mady" Moore of Hunton Andrews Kurth   - Publications, The National Law Review; Insuring Intellectual Property – Examining AI and Fair Use

"The frequency of lawsuits involving the development and deployment of AI technologies is increasing by the day. Recent lawsuits seeking to hold companies directly and secondarily liable for “joint enterprises” based on use (or alleged misuse) of copyrighted works for training AI models serve as important reminders about the protections that intellectual property (IP) insurance can offer to cover the risks associated with copyright infringement claims.

Recently, a California federal district court ruled that it was “fair use” for an AI software company to use copyrighted books to train its large language models (LLMs). However, the court also found the company’s unauthorized possession of over seven million pirated books that it downloaded from the internet (apparently for free) amounted to copyright infringement independent from whether the books were ultimately used to train the LLMs. In contrast, where the company purchased books before scanning them into digital files, the use was a permissible “fair use.”

The court’s order in Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. June 23, 2025), highlights the nuanced permissible use of copyrighted training data and underscores why policyholders engaged in the use of copyrighted material should acquire and maintain robust IP insurance that will reliably respond to claims of alleged infringement."

Tuesday, July 29, 2025

Meta pirated and seeded porn for years to train AI, lawsuit says; Ars Technica, July 28, 2025

 ASHLEY BELANGER  , Ars Technica; Meta pirated and seeded porn for years to train AI, lawsuit says

"Porn sites may have blown up Meta's key defense in a copyright fight with book authors who earlier this year said that Meta torrented "at least 81.7 terabytes of data across multiple shadow libraries" to train its AI models.

Meta has defeated most of the authors' claims and claimed there is no proof that Meta ever uploaded pirated data through seeding or leeching on the BitTorrent network used to download training data. But authors still have a chance to prove that Meta may have profited off its massive piracy, and a new lawsuit filed by adult sites last week appears to contain evidence that could help authors win their fight, TorrentFreak reported.

The new lawsuit was filed last Friday in a US district court in California by Strike 3 Holdings—which says it attracts "over 25 million monthly visitors" to sites that serve as "ethical sources" for adult videos that "are famous for redefining adult content with Hollywood style and quality."

After authors revealed Meta's torrenting, Strike 3 Holdings checked its proprietary BitTorrent-tracking tools designed to detect infringement of its videos and alleged that the company found evidence that Meta has been torrenting and seeding its copyrighted content for years—since at least 2018. Some of the IP addresses were clearly registered to Meta, while others appeared to be "hidden," and at least one was linked to a Meta employee, the filing said."

Monday, July 28, 2025

A copyright lawsuit over pirated books could result in ‘business-ending’ damages for Anthropic; Fortune, July 28, 2025

 BEATRICE NOLAN , Fortune; A copyright lawsuit over pirated books could result in ‘business-ending’ damages for Anthropic

"A class-action lawsuit against Anthropic could expose the AI company to billions in copyright damages over its alleged use of pirated books from shadow libraries like LibGen and PiLiMi to train its models. While a federal judge ruled that training on lawfully obtained books may qualify as fair use, the court will hold a separate trial to address the allegedly illegal acquisition and storage of copyrighted works. Legal experts warn that statutory damages could be severe, with estimates ranging from $1 billion to over $100 billion."

Sunday, July 20, 2025

AI guzzled millions of books without permission. Authors are fighting back.; The Washington Post, July 19, 2025

  , The Washington Post; AI guzzled millions of books without permission. Authors are fighting back.


[Kip Currier: I've written this before on this blog and I'll say it again: technology companies would never allow anyone to freely vacuum up their content and use it without permission or compensation. Period. Full Stop.]


[Excerpt]

"Baldacci is among a group of authors suing OpenAI and Microsoft over the companies’ use of their work to train the AI software behind tools such as ChatGPT and Copilot without permission or payment — one of more than 40 lawsuits against AI companies advancing through the nation’s courts. He and other authors this week appealed to Congress for help standing up to what they see as an assault by Big Tech on their profession and the soul of literature.

They found sympathetic ears at a Senate subcommittee hearing Wednesday, where lawmakers expressed outrage at the technology industry’s practices. Their cause gained further momentum Thursday when a federal judge granted class-action status to another group of authors who allege that the AI firm Anthropic pirated their books.

“I see it as one of the moral issues of our time with respect to technology,” Ralph Eubanks, an author and University of Mississippi professor who is president of the Authors Guild, said in a phone interview. “Sometimes it keeps me up at night.”

Lawsuits have revealed that some AI companies had used legally dubious “torrent” sites to download millions of digitized books without having to pay for them."

Judge Rules Class Action Suit Against Anthropic Can Proceed; Publishers Weekly, July 18, 2025

Jim Milliot , Publishers Weekly; Judge Rules Class Action Suit Against Anthropic Can Proceed

"In a major victory for authors, U.S. District Judge William Alsup ruled July 17 that three writers suing Anthropic for copyright infringement can represent all other authors whose books the AI company allegedly pirated to train its AI model as part of a class action lawsuit.

In late June, Alsup of the Northern District of California, ruled in Bartz v. Anthropic that the AI company's training of its Claude LLMs on authors' works was "exceedingly transformative," and therefore protected by fair use. However, Alsup also determined that the company's practice of downloading pirated books from sites including Books3, Library Genesis, and Pirate Library Mirror (PiLiMi) to build a permanent digital library was not covered by fair use.

Alsup’s most recent ruling follows an amended complaint from the authors looking to certify classes of copyright owners in a “Pirated Books Class” and in a “Scanned Books Class.” In his decision, Alsup certified only a LibGen and PiLiMi Pirated Books Class, writing that “this class is limited to actual or beneficial owners of timely registered copyrights in ISBN/ASIN-bearing books downloaded by Anthropic from these two pirate libraries.”

Alsup stressed that “the class is not limited to authors or author-like entities,” explaining that “a key point is to cover everyone who owns the specific copyright interest in play, the right to make copies, either as the actual or as the beneficial owner.” Later in his decision, Alsup makes it clear who is covered by the ruling: “A beneficial owner...is someone like an author who receives royalties from any publisher’s revenues or recoveries from the right to make copies. Yes, the legal owner might be the publisher but the author has a definite stake in the royalties, so the author has standing to sue. And, each stands to benefit from the copyright enforcement at the core of our case however they then divide the benefit.”"

US authors suing Anthropic can band together in copyright class action, judge rules; Reuters, July 17, 2025

 , Reuters; US authors suing Anthropic can band together in copyright class action, judge rules

"A California federal judge ruled on Thursday that three authors suing artificial intelligence startup Anthropic for copyright infringement can represent writers nationwide whose books Anthropic allegedly pirated to train its AI system.

U.S. District Judge William Alsup said the authors can bring a class action on behalf of all U.S. writers whose works Anthropic allegedly downloaded from "pirate libraries" LibGen and PiLiMi to create a repository of millions of books in 2021 and 2022."

Sunday, June 29, 2025

An AI firm won a lawsuit for copyright infringement — but may face a huge bill for piracy; Los Angeles Times, June 27, 2025

 Michael Hiltzik , Los Angeles Times; An AI firm won a lawsuit for copyright infringement — but may face a huge bill for piracy


[Kip Currier: Excellent informative overview of some of the principal issues, players, stakes, and recent decisions in the ongoing AI copyright legal battles. Definitely worth 5-10 minutes of your time to read and reflect on.

A key take-away, derived from Judge Vince Chhabria's decision in last week's Meta win, is that:

Artists and authors can win their copyright infringement cases if they produce evidence showing the bots are affecting their market. Chhabria all but pleaded for the plaintiffs to bring some such evidence before him: 

“It’s hard to imagine that it can be fair use to use copyrighted books...to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books.” 

But “the plaintiffs never so much as mentioned it,” he lamented.

https://www.latimes.com/business/story/2025-06-27/an-ai-firm-won-a-lawsuit-over-copyright-infringement-but-may-face-a-huge-bill-for-piracy]


[Excerpt]

"Anthropic had to acknowledge a troubling qualification in Alsup’s order, however. Although he found for the company on the copyright issue, he also noted that it had downloaded copies of more than 7 million books from online “shadow libraries,” which included countless copyrighted works, without permission. 

That action was “inherently, irredeemably infringing,” Alsup concluded. “We will have a trial on the pirated copies...and the resulting damages,” he advised Anthropic ominously: Piracy on that scale could expose the company to judgments worth untold millions of dollars...

“Neither case is going to be the last word” in the battle between copyright holders and AI developers, says Aaron Moss, a Los Angeles attorney specializing in copyright law. With more than 40 lawsuits on court dockets around the country, he told me, “it’s too early to declare that either side is going to win the ultimate battle.”...

With billions of dollars, even trillions, at stake for AI developers and the artistic community at stake, no one expects the law to be resolved until the issue reaches the Supreme Court, presumably years from now...

But Anthropic also downloaded copies of more than 7 million books from online “shadow libraries,” which include untold copyrighted works without permission. 

Alsup wrote that Anthropic “could have purchased books, but it preferred to steal them to avoid ‘legal/practice/business slog,’” Alsup wrote. (He was quoting Anthropic co-founder and CEO Dario Amodei.)...

Artists and authors can win their copyright infringement cases if they produce evidence showing the bots are affecting their market."...

The truth is that the AI camp is just trying to get out of paying for something instead of getting it for free. Never mind the trillions of dollars in revenue they say they expect over the next decade — they claim that licensing will be so expensive it will stop the march of this supposedly historic technology dead in its tracks.

Chhabria aptly called this argument “nonsense.” If using books for training is as valuable as the AI firms say they are, he noted, then surely a market for book licensing will emerge. That is, it will — if the courts don’t give the firms the right to use stolen works without compensation."

Tuesday, May 6, 2025

Meta lawsuit poses first big test of AI copyright battle; Financial Times, May 1, 2025

 and , Financial Times; Meta lawsuit poses first big test of AI copyright battle

 "The case, which has been brought by about a dozen authors including Ta-Nehisi Coates and Richard Kadrey, is centred on the $1.4tn social media giant’s use of LibGen, a so-called shadow library of millions of books, academic articles and comics, to train its Llama AI models. The ruling will have wide-reaching implications in the fierce copyright battle between artists and AI groups and is one of several lawsuits around the world that allege technology groups are using content without permission."

Friday, July 14, 2023

"Shadow libraries" are at the heart of the mounting copyright lawsuits against OpenAI; Quartz, July 10, 2023

Michelle Cheng, Quartz; "Shadow libraries" are at the heart of the mounting copyright lawsuits against OpenAI

"However, there are clues about these two data sets. “Books1” is linked to Project Gutenberg (an online e-book library with over 60,000 titles), a popular dataset for AI researchers to train their data on due to the lack of copyright, the filing states. “Books2” is estimated to contain about 294,000 titles, it notes.

Most of the “internet-based books corpora” is likely to come from shadow library websites such as Library Genesis, Z-Library, Sci-Hub, and Bibliotik. The books aggregated by these sites are available in bulk via torrent websites, which are known for hosting copyrighted materials

What exactly are shadow libraries?

Shadow libraries are online databases that provide access to millions of books and articles that are out of print, hard to obtain, and paywalled. Many of these databases, which began appearing online around 2008, originated in Russia, which has a long tradition of sharing forbidden books, according to the magazine Reason.

Soon enough, these libraries became popular with cash-strapped academics around the world thanks to the high cost of accessing scholarly journals—with some reportedly going for as much as $500 for an entirely open-access article.

These shadow libraries are also called “pirate libraries” because they often infringe on copyrighted work and cut into the publishing industry’s profits. A 2017 Nielsen and Digimarc study (pdf) found that pirated books were “depressing legitimate book sales by as much as 14%.”"