Monday, March 11, 2024

Nvidia sued over AI training data as copyright clashes continue; Ars Technica, March 11, 2024

  , Ars Technica Nvidia sued over AI training data as copyright clashes continue

"Book authors are suing Nvidia, alleging that the chipmaker's AI platform NeMo—used to power customized chatbots—was trained on a controversial dataset that illegally copied and distributed their books without their consent.

In a proposed class action, novelists Abdi Nazemian (Like a Love Story), Brian Keene (Ghost Walk), and Stewart O’Nan (Last Night at the Lobster) argued that Nvidia should pay damages and destroy all copies of the Books3 dataset used to power NeMo large language models (LLMs).

The Books3 dataset, novelists argued, copied "all of Bibliotek," a shadow library of approximately 196,640 pirated books. Initially shared through the AI community Hugging Face, the Books3 dataset today "is defunct and no longer accessible due to reported copyright infringement," the Hugging Face website says.

According to the authors, Hugging Face removed the dataset last October, but not before AI companies like Nvidia grabbed it and "made multiple copies." By training NeMo models on this dataset, the authors alleged that Nvidia "violated their exclusive rights under the Copyright Act." The authors argued that the US district court in San Francisco must intervene and stop Nvidia because the company "has continued to make copies of the Infringed Works for training other models.""

No comments: