Showing posts with label shadow libraries. Show all posts
Showing posts with label shadow libraries. Show all posts

Sunday, February 16, 2025

Court filings show Meta paused efforts to license books for AI training; TechCrunch, February 14, 3025

Kyle Wiggers, TechCrunch; Court filings show Meta paused efforts to license books for AI training

"According to one transcript, Sy Choudhury, who leads Meta’s AI partnership initiatives, said that Meta’s outreach to various publishers was met with “very slow uptake in engagement and interest.”

“I don’t recall the entire list, but I remember we had made a long list from initially scouring the Internet of top publishers, et cetera,” Choudhury said, per the transcript, “and we didn’t get contact and feedback from — from a lot of our cold call outreaches to try to establish contact.”

Choudhury added, “There were a few, like, that did, you know, engage, but not many.”

According to the court transcripts, Meta paused certain AI-related book licensing efforts in early April 2023 after encountering “timing” and other logistical setbacks. Choudhury said some publishers, in particular fiction book publishers, turned out to not in fact have the rights to the content that Meta was considering licensing, per a transcript.

“I’d like to point out that the — in the fiction category, we quickly learned from the business development team that most of the publishers we were talking to, they themselves were representing that they did not have, actually, the rights to license the data to us,” Choudhury said. “And so it would take a long time to engage with all their authors.”"

Thursday, August 10, 2023

Prosecraft has infuriated authors by using their books without consent – but what does copyright law say?; The Conversation, August 9, 2023

 Associate Professor, University of New South Wales, UNSW Sydney , The Conversation; Prosecraft has infuriated authors by using their books without consent – but what does copyright law say?

"In amending its laws, Australia legislated that parody or satire could form the basis of a fair dealing exception. A specific transformative use exception was not created. 

So, it is significantly less clear as to whether the use contemplated by Prosecraft or Shaxpir would be considered fair dealing in Australia. 

Australia has either missed a trick or dodged a bullet by failing to include transformative use as a fair dealing exception. It depends where you stand in the ongoing conflict between AI tech and human authors. But Australia’s laws are less AI-friendly than the US.

For the moment, published human authors are banking on the idea that if they can knock out the shadow library, they can hobble the reach of AI tech."

Friday, July 14, 2023

"Shadow libraries" are at the heart of the mounting copyright lawsuits against OpenAI; Quartz, July 10, 2023

Michelle Cheng, Quartz; "Shadow libraries" are at the heart of the mounting copyright lawsuits against OpenAI

"However, there are clues about these two data sets. “Books1” is linked to Project Gutenberg (an online e-book library with over 60,000 titles), a popular dataset for AI researchers to train their data on due to the lack of copyright, the filing states. “Books2” is estimated to contain about 294,000 titles, it notes.

Most of the “internet-based books corpora” is likely to come from shadow library websites such as Library Genesis, Z-Library, Sci-Hub, and Bibliotik. The books aggregated by these sites are available in bulk via torrent websites, which are known for hosting copyrighted materials

What exactly are shadow libraries?

Shadow libraries are online databases that provide access to millions of books and articles that are out of print, hard to obtain, and paywalled. Many of these databases, which began appearing online around 2008, originated in Russia, which has a long tradition of sharing forbidden books, according to the magazine Reason.

Soon enough, these libraries became popular with cash-strapped academics around the world thanks to the high cost of accessing scholarly journals—with some reportedly going for as much as $500 for an entirely open-access article.

These shadow libraries are also called “pirate libraries” because they often infringe on copyrighted work and cut into the publishing industry’s profits. A 2017 Nielsen and Digimarc study (pdf) found that pirated books were “depressing legitimate book sales by as much as 14%.”"