Intellectual Property (IP), Artificial Intelligence (AI), Open Movements (OM) : synthetic data

Tuesday, January 13, 2026

‘Clock Is Ticking’ For Creators On AI Content Copyright Claims, Experts Warn; Forbes, January 9, 2026

Rob Salkowitz, Forbes; ‘Clock Is Ticking’ For Creators On AI Content Copyright Claims, Experts Warn

"Despite this string of successes, creators like BT caution that content owners need to move quickly to secure any kind of terms. “A lot of artists have their heads in the sand with respect to AI,” he said. “The fact is, if they don’t come to some kind of agreement, they may end up with nothing.”

The concern is that AI models are increasingly being trained on synthetic data: that is, on the output of AI systems, rather than on content attributable to any individual creator or rights owner. Gartner estimates that 75% of AI training data in 2026 will be synthetic. That number could hit 100% by 2030. Once the tech companies no longer need human-produced content, they will stop paying for it.

“The quality of outputs from AI systems has been improving dramatically, which means that it is possible to train on synthetic data without risking model collapse,” said Dr. Daniela Braga, founder and CEO of the data training firm Defined.ai, in a separate interview at CES. “The window is definitely closing for individual rights owners to secure favorable terms.”

Other experts suggest that these claims may be overstated.

Braga says the best way creators can protect themselves is to do business with ethical companies willing to provide compensation for high-quality human-produced content and represent the superior value of that content to their customers. As models grow in capabilities, the need will shift from sheer volume of data to data that is appropriately tagged and annotated to fit easily into specific use cases.

There remain some profound questions around the sustainability of AI from a business standpoint, with demand for services among enterprise and consumers lagging the massive, and massively expensive, build-out of capacity. For some artists opposed to generative AI in its entirety, there may be the temptation to wait it out until the bubble bursts. After all, these artists created their work to be enjoyed by humans, not to be consumed in bulk by machines threatening their livelihoods. In light of those objections, the prospect of a meager payout might seem unappealing."

Thursday, January 9, 2025

Elon Musk says all human data for AI training ‘exhausted’; The Guardian, January 9, 2025

Dan Milmo, The Guardian; Elon Musk says all human data for AI training ‘exhausted’

"However, Musk also warned that AI models’ habit of generating “hallucinations” – a term for inaccurate or nonsensical output – was a danger for the synthetic data process.

He said in the livestreamed interview with Mark Penn, the chair of the advertising group Stagwell, that hallucinations had made the process of using artificial material “challenging” because “how do you know if it … hallucinated the answer or it’s a real answer”.

Andrew Duncan, the director of foundational AI at the UK’s Alan Turing Institute, said Musk’s comment tallied with a recent academic paper estimating that publicly available data for AI models could run out as soon as 2026. He added that over-reliance on synthetic data risked “model collapse”, a term referring to the outputs of models deteriorating in quality...

High-quality data, and control over it, is one of the legal battlegrounds in the AI boom. OpenAI admitted last year it would be impossible to create tools such as ChatGPT without access to copyrighted material, while the creative industries and publishers are demanding compensation for use of their output in the model training process."