Showing posts with label data privacy. Show all posts
Showing posts with label data privacy. Show all posts

Monday, November 4, 2024

What AI knows about you; Axios, November 4, 2024

 Ina Friend, Axios; What AI knows about you

"Most AI builders don't say where they are getting the data they use to train their bots and models — but legally they're required to say what they are doing with their customers' data.

The big picture: These data-use disclosures open a window onto the otherwise opaque world of Big Tech's AI brain-food fight.

  • In this new Axios series, we'll tell you, company by company, what all the key players are saying and doing with your personal information and content.

Why it matters: You might be just fine knowing that picture you just posted on Instagram is helping train the next generative AI art engine. But you might not — or you might just want to be choosier about what you share.

Zoom out: AI makers need an incomprehensibly gigantic amount of raw data to train their large language and image models. 

  • The industry's hunger has led to a data land grab: Companies are vying to teach their baby AIs using information sucked in from many different sources — sometimes with the owner's permission, often without it — before new laws and court rulings make that harder. 

Zoom in: Each Big Tech giant is building generative AI models, and many of them are using their customer data, in part, to train them.

  • In some cases it's opt-in, meaning your data won't be used unless you agree to it. In other cases it is opt-out, meaning your information will automatically get used unless you explicitly say no. 
  • These rules can vary by region, thanks to legal differences. For instance, Meta's Facebook and Instagram are "opt-out" — but you can only opt out if you live in Europe or Brazil.
  • In the U.S., California's data privacy law is among the laws responsible for requiring firms to say what they do with user data. In the EU, it's the GDPR."

Thursday, September 5, 2024

Intellectual property and data privacy: the hidden risks of AI; Nature, September 4, 2024

Amanda Heidt , Nature; Intellectual property and data privacy: the hidden risks of AI

"Timothée Poisot, a computational ecologist at the University of Montreal in Canada, has made a successful career out of studying the world’s biodiversity. A guiding principle for his research is that it must be useful, Poisot says, as he hopes it will be later this year, when it joins other work being considered at the 16th Conference of the Parties (COP16) to the United Nations Convention on Biological Diversity in Cali, Colombia. “Every piece of science we produce that is looked at by policymakers and stakeholders is both exciting and a little terrifying, since there are real stakes to it,” he says.

But Poisot worries that artificial intelligence (AI) will interfere with the relationship between science and policy in the future. Chatbots such as Microsoft’s Bing, Google’s Gemini and ChatGPT, made by tech firm OpenAI in San Francisco, California, were trained using a corpus of data scraped from the Internet — which probably includes Poisot’s work. But because chatbots don’t often cite the original content in their outputs, authors are stripped of the ability to understand how their work is used and to check the credibility of the AI’s statements. It seems, Poisot says, that unvetted claims produced by chatbots are likely to make their way into consequential meetings such as COP16, where they risk drowning out solid science.

“There’s an expectation that the research and synthesis is being done transparently, but if we start outsourcing those processes to an AI, there’s no way to know who did what and where the information is coming from and who should be credited,” he says...

The technology underlying genAI, which was first developed at public institutions in the 1960s, has now been taken over by private companies, which usually have no incentive to prioritize transparency or open access. As a result, the inner mechanics of genAI chatbots are almost always a black box — a series of algorithms that aren’t fully understood, even by their creators — and attribution of sources is often scrubbed from the output. This makes it nearly impossible to know exactly what has gone into a model’s answer to a prompt. Organizations such as OpenAI have so far asked users to ensure that outputs used in other work do not violate laws, including intellectual-property and copyright regulations, or divulge sensitive information, such as a person’s location, gender, age, ethnicity or contact information. Studies have shown that genAI tools might do both1,2."