Showing posts with label pirate libraries. Show all posts
Showing posts with label pirate libraries. Show all posts

Sunday, June 29, 2025

An AI firm won a lawsuit for copyright infringement — but may face a huge bill for piracy; Los Angeles Times, June 27, 2025

 Michael Hiltzik , Los Angeles Times; An AI firm won a lawsuit for copyright infringement — but may face a huge bill for piracy


[Kip Currier: Excellent informative overview of some of the principal issues, players, stakes, and recent decisions in the ongoing AI copyright legal battles. Definitely worth 5-10 minutes of your time to read and reflect on.

A key take-away, derived from Judge Vince Chhabria's decision in last week's Meta win, is that:

Artists and authors can win their copyright infringement cases if they produce evidence showing the bots are affecting their market. Chhabria all but pleaded for the plaintiffs to bring some such evidence before him: 

“It’s hard to imagine that it can be fair use to use copyrighted books...to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books.” 

But “the plaintiffs never so much as mentioned it,” he lamented.

https://www.latimes.com/business/story/2025-06-27/an-ai-firm-won-a-lawsuit-over-copyright-infringement-but-may-face-a-huge-bill-for-piracy]


[Excerpt]

"Anthropic had to acknowledge a troubling qualification in Alsup’s order, however. Although he found for the company on the copyright issue, he also noted that it had downloaded copies of more than 7 million books from online “shadow libraries,” which included countless copyrighted works, without permission. 

That action was “inherently, irredeemably infringing,” Alsup concluded. “We will have a trial on the pirated copies...and the resulting damages,” he advised Anthropic ominously: Piracy on that scale could expose the company to judgments worth untold millions of dollars...

“Neither case is going to be the last word” in the battle between copyright holders and AI developers, says Aaron Moss, a Los Angeles attorney specializing in copyright law. With more than 40 lawsuits on court dockets around the country, he told me, “it’s too early to declare that either side is going to win the ultimate battle.”...

With billions of dollars, even trillions, at stake for AI developers and the artistic community at stake, no one expects the law to be resolved until the issue reaches the Supreme Court, presumably years from now...

But Anthropic also downloaded copies of more than 7 million books from online “shadow libraries,” which include untold copyrighted works without permission. 

Alsup wrote that Anthropic “could have purchased books, but it preferred to steal them to avoid ‘legal/practice/business slog,’” Alsup wrote. (He was quoting Anthropic co-founder and CEO Dario Amodei.)...

Artists and authors can win their copyright infringement cases if they produce evidence showing the bots are affecting their market."...

The truth is that the AI camp is just trying to get out of paying for something instead of getting it for free. Never mind the trillions of dollars in revenue they say they expect over the next decade — they claim that licensing will be so expensive it will stop the march of this supposedly historic technology dead in its tracks.

Chhabria aptly called this argument “nonsense.” If using books for training is as valuable as the AI firms say they are, he noted, then surely a market for book licensing will emerge. That is, it will — if the courts don’t give the firms the right to use stolen works without compensation."

Tuesday, May 6, 2025

Meta lawsuit poses first big test of AI copyright battle; Financial Times, May 1, 2025

 and , Financial Times; Meta lawsuit poses first big test of AI copyright battle

 "The case, which has been brought by about a dozen authors including Ta-Nehisi Coates and Richard Kadrey, is centred on the $1.4tn social media giant’s use of LibGen, a so-called shadow library of millions of books, academic articles and comics, to train its Llama AI models. The ruling will have wide-reaching implications in the fierce copyright battle between artists and AI groups and is one of several lawsuits around the world that allege technology groups are using content without permission."

Friday, July 14, 2023

"Shadow libraries" are at the heart of the mounting copyright lawsuits against OpenAI; Quartz, July 10, 2023

Michelle Cheng, Quartz; "Shadow libraries" are at the heart of the mounting copyright lawsuits against OpenAI

"However, there are clues about these two data sets. “Books1” is linked to Project Gutenberg (an online e-book library with over 60,000 titles), a popular dataset for AI researchers to train their data on due to the lack of copyright, the filing states. “Books2” is estimated to contain about 294,000 titles, it notes.

Most of the “internet-based books corpora” is likely to come from shadow library websites such as Library Genesis, Z-Library, Sci-Hub, and Bibliotik. The books aggregated by these sites are available in bulk via torrent websites, which are known for hosting copyrighted materials

What exactly are shadow libraries?

Shadow libraries are online databases that provide access to millions of books and articles that are out of print, hard to obtain, and paywalled. Many of these databases, which began appearing online around 2008, originated in Russia, which has a long tradition of sharing forbidden books, according to the magazine Reason.

Soon enough, these libraries became popular with cash-strapped academics around the world thanks to the high cost of accessing scholarly journals—with some reportedly going for as much as $500 for an entirely open-access article.

These shadow libraries are also called “pirate libraries” because they often infringe on copyrighted work and cut into the publishing industry’s profits. A 2017 Nielsen and Digimarc study (pdf) found that pirated books were “depressing legitimate book sales by as much as 14%.”"