Intellectual Property (IP), Artificial Intelligence (AI), Open Movements (OM) : Hugging Face

Monday, March 11, 2024

Nvidia sued over AI training data as copyright clashes continue; Ars Technica, March 11, 2024

ASHLEY BELANGER , Ars Technica ; Nvidia sued over AI training data as copyright clashes continue

"Book authors are suing Nvidia, alleging that the chipmaker's AI platform NeMo—used to power customized chatbots—was trained on a controversial dataset that illegally copied and distributed their books without their consent.

In a proposed class action, novelists Abdi Nazemian (Like a Love Story), Brian Keene (Ghost Walk), and Stewart O’Nan (Last Night at the Lobster) argued that Nvidia should pay damages and destroy all copies of the Books3 dataset used to power NeMo large language models (LLMs).

The Books3 dataset, novelists argued, copied "all of Bibliotek," a shadow library of approximately 196,640 pirated books. Initially shared through the AI community Hugging Face, the Books3 dataset today "is defunct and no longer accessible due to reported copyright infringement," the Hugging Face website says.

According to the authors, Hugging Face removed the dataset last October, but not before AI companies like Nvidia grabbed it and "made multiple copies." By training NeMo models on this dataset, the authors alleged that Nvidia "violated their exclusive rights under the Copyright Act." The authors argued that the US district court in San Francisco must intervene and stop Nvidia because the company "has continued to make copies of the Infringed Works for training other models.""

Thursday, October 7, 2021

AI-ethics pioneer Margaret Mitchell on her five-year plan at open-source AI startup Hugging Face; Emerging Tech Brew, October 4, 2021

Hayden Field, Emerging Tech Brew ; AI-ethics pioneer Margaret Mitchell on her five-year plan at open-source AI startup Hugging Face

"Hugging Face wants to bring these powerful tools to more people. Its mission: Help companies build, train, and deploy AI models—specifically natural language processing (NLP) systems—via its open-source tools, like Transformers and Datasets. It also offers pretrained models available for download and customization.

So what does it mean to play a part in “democratizing” these powerful NLP tools? We chatted with Mitchell about the split from Google, her plans for her new role, and her near-future predictions for responsible AI."