Intellectual Property and Open Movements (IP&OM)

Issues and developments related to Intellectual Property (e.g. Copyright, Fair Use, Patents, Trademarks, Trade Secrets) and Open Movements (e.g. Open Access, Open Data, Open Educational Resources (OER)), examined in the "Intellectual Property and Open Movements" and "Ethics of Data, Information, and Emerging Technologies" graduate courses I teach at the University of Pittsburgh School of Computing and Information. -- Kip Currier, PhD, JD

Showing posts with label OpenAI. Show all posts

Tuesday, April 9, 2024

OpenAI’s GPT Store Is Triggering Copyright Complaints; Wired, April 4, 2024

Kate Knibbs, Wired ; OpenAI’s GPT Store Is Triggering Copyright Complaints

"It is easy to find bots in the GPT Store whose descriptions suggest they might be tapping copyrighted content in some way, as Techcrunch noted in a recent article claiming OpenAI’s store was overrun with “spam.” Using copyrighted material without permission is permissable in some contexts but in others rightsholders can take legal action."

Thursday, March 7, 2024

Researchers tested leading AI models for copyright infringement using popular books, and GPT-4 performed worst; CNBC, March 6, 2024

Hayden Field, CNBC; Researchers tested leading AI models for copyright infringement using popular books, and GPT-4 performed worst

"The company, founded by ex-Meta researchers, specializes in evaluation and testing for large language models — the technology behind generative AI products.

Alongside the release of its new tool, CopyrightCatcher, Patronus AI released results of an adversarial test meant to showcase how often four leading AI models respond to user queries using copyrighted text.

The four models it tested were OpenAI’s GPT-4, Anthropic’s Claude 2, Meta’s Llama 2 and Mistral AI’s Mixtral.

“We pretty much found copyrighted content across the board, across all models that we evaluated, whether it’s open source or closed source,” Rebecca Qian, Patronus AI’s cofounder and CTO, who previously worked on responsible AI research at Meta, told CNBC in an interview.

Qian added, “Perhaps what was surprising is that we found that OpenAI’s GPT-4, which is arguably the most powerful model that’s being used by a lot of companies and also individual developers, produced copyrighted content on 44% of prompts that we constructed.”"

Thursday, February 29, 2024

The Intercept, Raw Story and AlterNet sue OpenAI for copyright infringement; The Guardian, February 28, 2024

Nick Robins-Early, The Guardian ; The Intercept, Raw Story and AlterNet sue OpenAI for copyright infringement

"OpenAI and Microsoft are facing a fresh round of lawsuits from news publishers over allegations that their generative artificial intelligence products violated copyright laws and illegally trained by using journalists’ work. Three progressive US outlets – the Intercept, Raw Story and AlterNet – filed suits in Manhattan federal court on Wednesday, demanding compensation from the tech companies.

The news outlets claim that the companies in effect plagiarized copyright-protected articles to develop and operate ChatGPT, which has become OpenAI’s most prominent generative AI tool. They allege that ChatGPT was trained not to respect copyright, ignores proper attribution and fails to notify users when the service’s answers are generated using journalists’ protected work."

Saturday, February 17, 2024

The New York Times’ AI copyright lawsuit shows that forgiveness might not be better than permission; The Conversation, February 13, 2024

Peter Vaughan, Senior Lecturer, Nottingham Law School, Nottingham Trent University, The Conversation; ; The New York Times’ AI copyright lawsuit shows that forgiveness might not be better than permission

"The lawsuit also presents a novel argument – not advanced by other, similar cases – that’s related to something called “hallucinations”, where AI systems generate false or misleading information but present it as fact. This argument could in fact be one of the most potent in the case.

The NYT case in particular raises three interesting takes on the usual approach. First, that due to their reputation for trustworthy news and information, NYT content has enhanced value and desirability as training data for use in AI.

Second, that due to its paywall, the reproduction of articles on request is commercially damaging. Third, that ChatGPT “hallucinations” are causing reputational damage to the New York Times through, effectively, false attribution.

This is not just another generative AI copyright dispute. The first argument presented by the NYT is that the training data used by OpenAI is protected by copyright, and so they claim the training phase of ChatGPT infringed copyright. We have seen this type of argument run before in other disputes."

Thursday, February 15, 2024

Judge rejects most ChatGPT copyright claims from book authors; Ars Technica, February 13, 2024

ASHLEY BELANGER, Ars Technica; Judge rejects most ChatGPT copyright claims from book authors

"A US district judge in California has largely sided with OpenAI, dismissing the majority of claims raised by authors alleging that large language models powering ChatGPT were illegally trained on pirated copies of their books without their permission."

Thursday, December 28, 2023

Complaint: New York Times v. Microsoft & OpenAI, December 2023

Complaint:

THE NEW YORK TIMES COMPANY Plaintiff,

MICROSOFT CORPORATION, OPENAI, INC., OPENAI LP, OPENAI GP, LLC, OPENAI, LLC, OPENAI OPCO LLC, OPENAI GLOBAL LLC, OAI CORPORATION, LLC, and OPENAI HOLDINGS, LLC,

Defendants

Wednesday, December 27, 2023

The Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work; The New York Times, December 27, 2023

Michael M. Grynbaum and Ryan Mac, The New York Times; The Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work

"The New York Times sued OpenAI and Microsoft for copyright infringement on Wednesday, opening a new front in the increasingly intense legal battle over the unauthorized use of published work to train artificial intelligence technologies.

The Times is the first major American media organization to sue the companies, the creators of ChatGPT and other popular A.I. platforms, over copyright issues associated with its written works. The lawsuit, filed in Federal District Court in Manhattan, contends that millions of articles published by The Times were used to train automated chatbots that now compete with the news outlet as a source of reliable information.

The suit does not include an exact monetary demand. But it says the defendants should be held responsible for “billions of dollars in statutory and actual damages” related to the “unlawful copying and use of The Times’s uniquely valuable works.” It also calls for the companies to destroy any chatbot models and training data that use copyrighted material from The Times."

Sunday, December 17, 2023

Science fiction writers imagine a future in which AI doesn’t abuse copyright – or their generosity; The Register, December 15, 2023

Simon Sharwood, The Register ; Science fiction writers imagine a future in which AI doesn’t abuse copyright – or their generosity

"Which is why several authors and the Authors Guild have launched lawsuits against OpenAI. It's also why the US Copyright Office in August 2023 launched an inquiry into copyright and artificial intelligence and invited public comments.

The SFWA took advantage of that offer, as did many others: the consultation has generated over 10,000 comments.

The Association's most recent submission – lodged on December 7 and noticed by Torrentfreak – notes that it is in "the unique position of representing many authors who have fought to make their work available for free for human readers."

"Over the last twenty years, many science fiction and fantasy authors of short fiction have embraced the open internet, believing that it is good for society and for a flourishing culture that art be available to their fellow human beings regardless of ability to pay," the submission states. But there's a difference between making a work free and giving it away.

"Being freely available has never meant abandoning the moral and legal rights of the authors, nor the obligation to enter into legal contracts to compensate authors for their work and spell out how it may and may not be used," the submission argues.

"The current content-scraping regime preys on that good-faith sharing of art as a connection between human minds and the hard work of building a common culture," the submission adds."

Monday, November 6, 2023

OpenAI offers to pay for ChatGPT customers’ copyright lawsuits; The Guardian, November 6, 2023

Blake Montgomery, The Guardian; OpenAI offers to pay for ChatGPT customers’ copyright lawsuits

"Rather than remove copyrighted material from ChatGPT’s training dataset, the chatbot’s creator is offering to cover its clients’ legal costs for copyright infringement suits.

OpenAI CEO Sam Altman said on Monday: “We can defend our customers and pay the costs incurred if you face legal claims around copyright infringement and this applies both to ChatGPT Enterprise and the API.” The compensation offer, which OpenAI is calling Copyright Shield, applies to users of the business tier, ChatGPT Enterprise, and to developers using ChatGPT’s application programming interface. Users of the free version of ChatGPT or ChatGPT+ were not included.

OpenAI is not the first to offer such legal protection, though as the creator of the wildly popular ChatGPT, which Altman said has 100 million weekly users, it is a heavyweight player in the industry. Google, Microsoft and Amazon have made similar offers to users of their generative AI software. Getty Images, Shutterstock and Adobe have extended similar financial liability protection for their image-making software."

Sunday, November 5, 2023

Artists may “poison” AI models before Copyright Office can issue guidance; Ars Technica, November 3, 2023

ASHLEY BELANGER , Ars Technica ; Artists may “poison” AI models before Copyright Office can issue guidance

"Rather than rely on opting out of future AI training data sets—or, as OpenAI recommends, blocking AI makers' web crawlers from accessing and scraping their sites in the future—artists are figuring out how to manipulate their images to block AI models from correctly interpreting their content."

Tuesday, October 24, 2023

The fingerprints on a letter to Congress about AI; Politico, October 23, 2023

BRENDAN BORDELON, Politico; The fingerprints on a letter to Congress about AI

"The message in the open letter sent to Congress on Sept. 11 was clear: Don’t put new copyright regulations on artificial intelligence systems.

The letter’s signatories were real players, a broad coalition of think tanks, professors and civil-society groups with a stake in the growing debate about AI and copyright in Washington.

Undisclosed, however, were the fingerprints of Sy Damle, a tech-friendly Washington lawyer and former government official who works for top firms in the industry — including OpenAI, one of the top developers of cutting-edge AI models. Damle is currently representing OpenAI in ongoing copyright lawsuits...

The effort by an OpenAI lawyer to covertly sway Congress against new laws on AI and copyright comes in the midst of an escalating influence campaign — tied to OpenAI and other top AI firms — that critics fear is shifting Washington’s attention away from current AI harms and toward existential threats posed by future AI systems...

Many of the points made in the September letter echo those made recently by Damle in other venues, including an argument comparing the rise of AI to the invention of photography."

Wednesday, October 18, 2023

A.I. May Not Get a Chance to Kill Us if This Kills It First; Slate, October 17, 2023

SCOTT NOVER, Slate; A.I. May Not Get a Chance to Kill Us if This Kills It First

"There is a disaster scenario for OpenAI and other companies funneling billions into A.I. models: If a court found that a company was liable for copyright infringement, it could completely halt the development of the offending model."

Wednesday, September 20, 2023

Franzen, Grisham and Other Prominent Authors Sue OpenAI; The New York Times, September 20, 2023

Alexandra Alter and Elizabeth A. Harris, The New York Times; Franzen, Grisham and Other Prominent Authors Sue OpenAI

"A group of prominent novelists, including John Grisham, Jonathan Franzen and Elin Hilderbrand, are joining the legal battle against OpenAI over its chatbot technology, as fears about the encroachment of artificial intelligence on creative industries continue to grow.

More than a dozen authors filed a lawsuit against OpenAI on Tuesday, accusing the company, which has been backed with billions of dollars in investment from Microsoft, of infringing on their copyrights by using their books to train its popular ChatGPT chatbot. The complaint, which was filed along with the Authors Guild, said that OpenAI’s chatbots can now produce “derivative works” that can mimic and summarize the authors’ books, potentially harming the market for authors’ work, and that the writers were neither compensated nor notified by the company."

Tuesday, September 12, 2023

Another group of writers is suing OpenAI over copyright claims; The Verge, September 11, 2023

Emma Roth , The Verge; Another group of writers is suing OpenAI over copyright claims

"A group of writers is suing OpenAI over claims the company illegally used their works to train its AI ChatGPT chatbot, as reported earlier by Reuters. In a lawsuit filed on Friday, Michael Chabon, David Henry Hwang, Rachel Louise Snyder, and Ayelet Waldman allege OpenAI benefits and profits from the “unauthorized and illegal use” of their copyrighted content.

The lawsuit is seeking class-action status and calls out ChatGPT’s ability to summarize and analyze the content written by the authors, stating this “is only possible” if OpenAI trained its GPT large language model on their works. It adds that these outputs are actually “derivative” works that infringe on their copyrights."

Thursday, August 17, 2023

New York Times considers legal action against OpenAI as copyright tensions swirl; NPR, August 16, 2023

Bobby Allyn , NPR; New York Times considers legal action against OpenAI as copyright tensions swirl

"The New York Times and OpenAI could end up in court.

Lawyers for the newspaper are exploring whether to sue OpenAI to protect the intellectual property rights associated with its reporting, according to two people with direct knowledge of the discussions.

For weeks, The Times and the maker of ChatGPT have been locked in tense negotiations over reaching a licensing deal in which OpenAI would pay The Times for incorporating its stories in the tech company's AI tools, but the discussions have become so contentious that the paper is now considering legal action."

Monday, July 17, 2023

Thousands of authors urge AI companies to stop using work without permission; Morning Edition, NPR, July 17, 2023

Chloe Veltman, Morning Edition NPR; Thousands of authors urge AI companies to stop using work without permission

"Thousands of writers including Nora Roberts, Viet Thanh Nguyen, Michael Chabon and Margaret Atwood have signed a letter asking artificial intelligence companies like OpenAI and Meta to stop using their work without permission or compensation."

Friday, July 14, 2023

"Shadow libraries" are at the heart of the mounting copyright lawsuits against OpenAI; Quartz, July 10, 2023

Michelle Cheng, Quartz; "Shadow libraries" are at the heart of the mounting copyright lawsuits against OpenAI

"However, there are clues about these two data sets. “Books1” is linked to Project Gutenberg (an online e-book library with over 60,000 titles), a popular dataset for AI researchers to train their data on due to the lack of copyright, the filing states. “Books2” is estimated to contain about 294,000 titles, it notes.

Most of the “internet-based books corpora” is likely to come from shadow library websites such as Library Genesis, Z-Library, Sci-Hub, and Bibliotik. The books aggregated by these sites are available in bulk via torrent websites, which are known for hosting copyrighted materials.

What exactly are shadow libraries?

Shadow libraries are online databases that provide access to millions of books and articles that are out of print, hard to obtain, and paywalled. Many of these databases, which began appearing online around 2008, originated in Russia, which has a long tradition of sharing forbidden books, according to the magazine Reason.

Soon enough, these libraries became popular with cash-strapped academics around the world thanks to the high cost of accessing scholarly journals—with some reportedly going for as much as $500 for an entirely open-access article.

These shadow libraries are also called “pirate libraries” because they often infringe on copyrighted work and cut into the publishing industry’s profits. A 2017 Nielsen and Digimarc study (pdf) found that pirated books were “depressing legitimate book sales by as much as 14%.”"

Tuesday, July 11, 2023

What Sarah Silverman’s lawsuit against OpenAI and Meta really means | The AI Beat; Venture Beat, July 10, 2023

S haron Goldman , Venture Beat; What Sarah Silverman’s lawsuit against OpenAI and Meta really means | The AI Beat

"Legal AI issues around copyright and ‘fair use’ growing louder

These legal issues around copyright and “fair use” are not going away — in fact, they go to the heart of what today’s LLMs are made of — that is, the training data. As I discussed last week, web scraping for massive amounts of data can arguably be described as the secret sauce of generative AI. AI chatbots like ChatGPT, LLaMA, Claude (from Anthropic) and Bard (from Google) can spit out coherent text because they were trained on massive corpora of data, mostly scraped from the internet. And as the size of today’s LLMs like GPT-4 have ballooned to hundreds of billions of tokens, so has the hunger for data."

Sunday, July 9, 2023

Sarah Silverman is suing OpenAI and Meta for copyright infringement; The Verge, July 9, 2023

Wes Davis, The Verge ; Sarah Silverman is suing OpenAI and Meta for copyright infringement

"Comedian and author Sarah Silverman, as well as authors Christopher Golden and Richard Kadrey — are suing OpenAI and Meta each in a US District Court over dual claims of copyright infringement.

The suits alleges, among other things, that OpenAI’s ChatGPT and Meta’s LLaMA were trained on illegally-acquired datasets containing their works, which they say were acquired from “shadow library” websites like Bibliotik, Library Genesis, Z-Library, and others, noting the books are “available in bulk via torrent systems.”"

Monday, July 3, 2023

ChatGPT Maker OpenAI Accused of Misusing Personal, Copyrighted Data; The San Francisco Standard, June 30, 2023

Kevin Truong, The San Francisco Standard; ChatGPT Maker OpenAI Accused of Misusing Personal, Copyrighted Data

"The suit alleges that ChatGPT utilizes "stolen private information, including personally identifiable information, from hundreds of millions of internet users, including children of all ages, without their informed consent or knowledge."

The complaint states that by using this data, OpenAI and its related entities have enough information to replicate digital clones, encourage people's "professional obsolescence" and "obliterate privacy as we know it."

The complaint lists several plaintiffs identified by their initials, including a software engineer who claims that his online posts around technical questions could be used to eliminate his job, a 6-year-old who used a microphone to interact with ChatGPT and allegedly had his data harvested, and an actor who claims that OpenAI stole personal data from online applications to train its system."