Amanda Heidt , Nature; Intellectual property and data privacy: the hidden risks of AI
"Timothée Poisot, a computational ecologist at the University of Montreal in Canada, has made a successful career out of studying the world’s biodiversity. A guiding principle for his research is that it must be useful, Poisot says, as he hopes it will be later this year, when it joins other work being considered at the 16th Conference of the Parties (COP16) to the United Nations Convention on Biological Diversity in Cali, Colombia. “Every piece of science we produce that is looked at by policymakers and stakeholders is both exciting and a little terrifying, since there are real stakes to it,” he says.
But Poisot worries that artificial intelligence (AI) will interfere with the relationship between science and policy in the future. Chatbots such as Microsoft’s Bing, Google’s Gemini and ChatGPT, made by tech firm OpenAI in San Francisco, California, were trained using a corpus of data scraped from the Internet — which probably includes Poisot’s work. But because chatbots don’t often cite the original content in their outputs, authors are stripped of the ability to understand how their work is used and to check the credibility of the AI’s statements. It seems, Poisot says, that unvetted claims produced by chatbots are likely to make their way into consequential meetings such as COP16, where they risk drowning out solid science.
“There’s an expectation that the research and synthesis is being done transparently, but if we start outsourcing those processes to an AI, there’s no way to know who did what and where the information is coming from and who should be credited,” he says...
The technology underlying genAI, which was first developed at public institutions in the 1960s, has now been taken over by private companies, which usually have no incentive to prioritize transparency or open access. As a result, the inner mechanics of genAI chatbots are almost always a black box — a series of algorithms that aren’t fully understood, even by their creators — and attribution of sources is often scrubbed from the output. This makes it nearly impossible to know exactly what has gone into a model’s answer to a prompt. Organizations such as OpenAI have so far asked users to ensure that outputs used in other work do not violate laws, including intellectual-property and copyright regulations, or divulge sensitive information, such as a person’s location, gender, age, ethnicity or contact information. Studies have shown that genAI tools might do both1,2."