Showing posts with label AI datasets. Show all posts

Tuesday, July 11, 2023

What Sarah Silverman’s lawsuit against OpenAI and Meta really means | The AI Beat; Venture Beat, July 10, 2023

S haron Goldman , Venture Beat; What Sarah Silverman’s lawsuit against OpenAI and Meta really means | The AI Beat

"Legal AI issues around copyright and ‘fair use’ growing louder

These legal issues around copyright and “fair use” are not going away — in fact, they go to the heart of what today’s LLMs are made of — that is, the training data. As I discussed last week, web scraping for massive amounts of data can arguably be described as the secret sauce of generative AI. AI chatbots like ChatGPT, LLaMA, Claude (from Anthropic) and Bard (from Google) can spit out coherent text because they were trained on massive corpora of data, mostly scraped from the internet. And as the size of today’s LLMs like GPT-4 have ballooned to hundreds of billions of tokens, so has the hunger for data."

Wednesday, June 7, 2023

Japan Declares AI Training Data Fair Game and ‘Will Not Enforce Copyright’; PetaPixel, June 5, 2023

MATT GROWCOOT , PetaPixel; Japan Declares AI Training Data Fair Game and ‘Will Not Enforce Copyright’

"In the first such declaration of its kind, Japan has seemingly asserted that it will not enforce copyrights when it comes to training generative artificial intelligence (AI) programs.

Japan’s minister of education, culture, sports, science, and technology recently said that it is possible to take content from any source and use it for “information analysis.”

According to a Japanese political website, Liberal Democrat minister Keiko Nagoaka clearly stated at a committee meeting that AI companies can use whatever data they want to train generative AI programs."

Sunday, April 30, 2023

A Photographer Tried to Get His Photos Removed from an AI Dataset. He Got an Invoice Instead.; Vice, April 28, 2023

Chloe Xiang, Vice ; A Photographer Tried to Get His Photos Removed from an AI Dataset. He Got an Invoice Instead.

"The legality of using copyrighted material to train AI is still very contentious and there has not yet been a precedent case that can be used to determine the validity of either side of the case."