Sharon Goldman , Venture Beat; What Sarah Silverman’s lawsuit against OpenAI and Meta really means | The AI Beat
"Legal AI issues around copyright and ‘fair use’ growing louder
These legal issues around copyright and “fair use” are not going away — in fact, they go to the heart of what today’s LLMs are made of — that is, the training data. As I discussed last week, web scraping for massive amounts of data can arguably be described as the secret sauce of generative AI. AI chatbots like ChatGPT, LLaMA, Claude (from Anthropic) and Bard (from Google) can spit out coherent text because they were trained on massive corpora of data, mostly scraped from the internet. And as the size of today’s LLMs like GPT-4 have ballooned to hundreds of billions of tokens, so has the hunger for data."