Showing posts with label AI datasets. Show all posts
Showing posts with label AI datasets. Show all posts

Tuesday, July 11, 2023

What Sarah Silverman’s lawsuit against OpenAI and Meta really means | The AI Beat; Venture Beat, July 10, 2023

 , Venture Beat; What Sarah Silverman’s lawsuit against OpenAI and Meta really means | The AI Beat

"Legal AI issues around copyright and ‘fair use’ growing louder 

These legal issues around copyright and “fair use” are not going away — in fact, they go to the heart of what today’s LLMs are made of — that is, the training data. As I discussed last week, web scraping for massive amounts of data can arguably be described as the secret sauce of generative AI. AI chatbots like ChatGPT, LLaMA, Claude (from Anthropic) and Bard (from Google) can spit out coherent text because they were trained on massive corpora of data, mostly scraped from the internet. And as the size of today’s LLMs like GPT-4 have ballooned to hundreds of billions of tokens, so has the hunger for data."

Wednesday, June 7, 2023

Japan Declares AI Training Data Fair Game and ‘Will Not Enforce Copyright’; PetaPixel, June 5, 2023

 MATT GROWCOOT , PetaPixel; Japan Declares AI Training Data Fair Game and ‘Will Not Enforce Copyright’

"In the first such declaration of its kind, Japan has seemingly asserted that it will not enforce copyrights when it comes to training generative artificial intelligence (AI) programs.

Japan’s minister of education, culture, sports, science, and technology recently said that it is possible to take content from any source and use it for “information analysis.” 

According to a Japanese political website, Liberal Democrat minister Keiko Nagoaka clearly stated at a committee meeting that AI companies can use whatever data they want to train generative AI programs."