Showing posts with label publishers. Show all posts
Showing posts with label publishers. Show all posts

Friday, December 6, 2024

Internet Archive Copyright Case Ends Without Supreme Court Review; Publishers Weekly, December 5, 2024

 Andrew Albanese, Publishers Weekly; Internet Archive Copyright Case Ends Without Supreme Court Review

"After more than four years of litigation, a closely watched copyright case over the Internet Archive’s scanning and lending of library books is finally over after Internet Archive officials decided against exercising their last option, an appeal to the Supreme Court. The deadline to file an appeal was December 3.

With a consent judgment already entered to settle claims in the case, the official end of the litigation now triggers an undisclosed monetary payment to the plaintiff publishers, which, according to the Association of American Publishers, will “substantially” cover the publishers’ attorney fees and costs in the litigation."

Thursday, November 21, 2024

OpenAI accidentally deleted potential evidence in NY Times copyright lawsuit; TechCrunch, November 20, 2024

Kyle Wiggers , TechCrunch; OpenAI accidentally deleted potential evidence in NY Times copyright lawsuit

"OpenAI tried to recover the data — and was mostly successful. However, because the folder structure and file names were “irretrievably” lost, the recovered data “cannot be used to determine where the news plaintiffs’ copied articles were used to build [OpenAI’s] models,” per the letter.

“News plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time,” counsel for The Times and Daily News wrote. “The news plaintiffs learned only yesterday that the recovered data is unusable and that an entire week’s worth of its experts’ and lawyers’ work must be re-done, which is why this supplemental letter is being filed today.”

The plaintiffs’ counsel makes clear that they have no reason to believe the deletion was intentional. But they do say the incident underscores that OpenAI “is in the best position to search its own datasets” for potentially infringing content using its own tools."

Tuesday, November 5, 2024

Penguin Random House books now explicitly say ‘no’ to AI training; The Verge, October 18, 2024

 Emma Roth , The Verge; Penguin Random House books now explicitly say ‘no’ to AI training

"Book publisher Penguin Random House is putting its stance on AI training in print. The standard copyright page on both new and reprinted books will now say, “No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems,” according to a report from The Bookseller spotted by Gizmodo. 

The clause also notes that Penguin Random House “expressly reserves this work from the text and data mining exception” in line with the European Union’s laws. The Bookseller says that Penguin Random House appears to be the first major publisher to account for AI on its copyright page. 

What gets printed on that page might be a warning shot, but it also has little to do with actual copyright law. The amended page is sort of like Penguin Random House’s version of a robots.txt file, which websites will sometimes use to ask AI companies and others not to scrape their content. But robots.txt isn’t a legal mechanism; it’s a voluntarily-adopted norm across the web. Copyright protections exist regardless of whether the copyright page is slipped into the front of the book, and fair use and other defenses (if applicable!) also exist even if the rights holder says they do not."

Friday, October 18, 2024

Penguin Random House underscores copyright protection in AI rebuff; The Bookseller, October 18, 2024

 MATILDA BATTERSBY, The Bookseller; Penguin Random House underscores copyright protection in AI rebuff

"The world’s biggest trade publisher has changed the wording on its copyright pages to help protect authors’ intellectual property from being used to train large language models (LLMs) and other artificial intelligence (AI) tools, The Bookseller can exclusively reveal.

Penguin Random House (PRH) has amended its copyright wording across all imprints globally, confirming it will appear “in imprint pages across our markets”. The new wording states: “No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems”, and will be included in all new titles and any backlist titles that are reprinted.

The statement also “expressly reserves [the titles] from the text and data mining exception”, in accordance with a European Parliament directive.

The move specifically to ban the use of its titles by AI firms for the development of chatbots and other digital tools comes amid a slew of copyright infringement cases in the US and reports that large tranches of pirated books have already been used by tech companies to train AI tools. In 2024, several academic publishers including Taylor & Francis, Wiley and Sage have announced partnerships to license content to AI firms.

PRH is believed to be the first of the Big Five anglophone trade publishers to amend its copyright information to reflect the acceleration of AI systems and the alleged reliance by tech companies on using published work to train language models."

Friday, October 11, 2024

Why The New York Times' lawyers are inspecting OpenAI's code in a secretive room; Business Insider, October 10, 2024

  , Business Insider; Why The New York Times' lawyers are inspecting OpenAI's code in a secretive room

"OpenAI is worth $157 billion largely because of the success of ChatGPT. But to build the chatbot, the company trained its models on vast quantities of text it didn't pay a penny for.

That text includes stories from The New York Times, articles from other publications, and an untold number of copyrighted books.

The examination of the code for ChatGPT, as well as for Microsoft's artificial intelligence models built using OpenAI's technology, is crucial for the copyright infringement lawsuits against the two companies.

Publishers and artists have filed about two dozen major copyright lawsuits against generative AI companies. They are out for blood, demanding a slice of the economic pie that made OpenAI the dominant player in the industry and which pushed Microsoft's valuation beyond $3 trillion. Judges deciding those cases may carve out the legal parameters for how large language models are trained in the US."

Sunday, September 29, 2024

AI could be an existential threat to publishers – that’s why Mumsnet is fighting back; The Guardian, September 28, 2024

 , The Guardian; AI could be an existential threat to publishers – that’s why Mumsnet is fighting back

"After nearly 25 years as a founder of Mumsnet, I considered myself pretty unshockable when it came to the workings of big tech. But my jaw hit the floor last week when I read that Google was pushing to overhaul UK copyright law in a way that would allow it to freely mine other publishers’ content for commercial gain without compensation.

At Mumsnet, we’ve been on the sharp end of this practice, and have recently launched the first British legal action against the tech giant OpenAI. Earlier in the year, we became aware that it was scraping our content – presumably to train its large language model (LLM). Such scraping without permission is a breach of copyright laws and explicitly of our terms of use, so we approached OpenAI and suggested a licensing deal. After lengthy talks (and signing a non-disclosure agreement), it told us it wasn’t interested, saying it was after “less open” data sources...

If publishers wither and die because the AIs have hoovered up all their traffic, then who’s left to produce the content to feed the models? And let’s be honest – it’s not as if these tech giants can’t afford to properly compensate publishers. OpenAI is currently fundraising to the tune of $6.5bn, the single largest venture capital round of all time, valuing the enterprise at a cool $150bn. In fact, it has just been reported that the company is planning to change its structure and become a for-profit enterprise...

I’m not anti-AI. It plainly has the potential to advance human progress and improve our lives in myriad ways. We used it at Mumsnet to build MumsGPT, which uncovers and summarises what parents are thinking about – everything from beauty trends to supermarkets to politicians – and we licensed OpenAI’s API (application programming interface) to build it. Plus, we think there are some very good reasons why these AI models should ingest Mumsnet’s conversations to train their models. The 6bn-plus words on Mumsnet are a unique record of 24 years of female interaction about everything from global politics to relationships with in-laws. By contrast, most of the content on the web was written by and for men. AI models have misogyny baked in and we’d love to help counter their gender bias.

But Google’s proposal to change our laws would allow billion-dollar companies to waltz untrammelled over any notion of a fair value exchange in the name of rapid “development”. Everything that’s unique and brilliant about smaller publisher sites would be lost, and a handful of Silicon Valley giants would be left with even more control over the world’s content and commerce."

Monday, September 9, 2024

Internet Archive Court Loss Leaves Higher Ed in Gray Area; Inside Higher Ed, September 9, 2024

 Lauren Coffey, Inside Higher Ed; Internet Archive Court Loss Leaves Higher Ed in Gray Area

"Pandemic-era library programs that helped students access books online could be potentially threatened by an appeals court ruling last week. 

Libraries across the country, from Carnegie Mellon University to the University of California system, turned to what’s known as a digital or controlled lending program in 2020, which gave students a way to borrow books that weren’t otherwise available. Those programs are small in scale and largely experimental but part of a broader shift in modernizing the university library.

But the appeals court ruling could upend those programs...

Still, librarians at colleges and elsewhere, along with other experts, feared that the long-running legal fight between the Internet Archive and leading publishers could imperil the ability of libraries to own and preserve books, among other ramifications."

Thursday, August 29, 2024

OpenAI Pushes Prompt-Hacking Defense to Deflect Copyright Claims; Bloomberg Law, August 29, 2024

 Annelise Gilbert, Bloomberg Law; OpenAI Pushes Prompt-Hacking Defense to Deflect Copyright Claims

"Diverting attention to hacking claims or how many tries it took to obtain exemplary outputs, however, avoids addressing most publishers’ primary allegation: AI tools illegally trained on copyrighted works."

Tuesday, July 23, 2024

The Data That Powers A.I. Is Disappearing Fast; The New York Times, July 19, 2024

Kevin Roose , The New York Times; The Data That Powers A.I. Is Disappearing Fast

"For years, the people building powerful artificial intelligence systems have used enormous troves of text, images and videos pulled from the internet to train their models.

Now, that data is drying up.

Over the past year, many of the most important web sources used for training A.I. models have restricted the use of their data, according to a study published this week by the Data Provenance Initiative, an M.I.T.-led research group.

The study, which looked at 14,000 web domains that are included in three commonly used A.I. training data sets, discovered an “emerging crisis in consent,” as publishers and online platforms have taken steps to prevent their data from being harvested.

The researchers estimate that in the three data sets — called C4, RefinedWeb and Dolma — 5 percent of all data, and 25 percent of data from the highest-quality sources, has been restricted. Those restrictions are set up through the Robots Exclusion Protocol, a decades-old method for website owners to prevent automated bots from crawling their pages using a file called robots.txt."

Tuesday, July 9, 2024

Record labels sue AI music startups for copyright infringement; WBUR Here & Now, July 8, 2024

 WBUR Here & Now; Record labels sue AI music startups for copyright infringement

"Major record labels including Sony, Universal Music Group and Warner are suing two music startups that use artificial intelligence. The labels say Suno and Udio rely on mass copyright infringement, echoing similar complaints from authors, publishers and artists who argue that generative AI infringes on copyright.

Here & Now's Lisa Mullins discusses the cases with Ina Fried, chief technology correspondent for Axios."

Monday, July 1, 2024

Internet Archive forced to remove 500,000 books after publishers’ court win; Ars Technica, June 21, 2024

, Ars Technica; Internet Archive forced to remove 500,000 books after publishers’ court win

"As a result of book publishers successfully suing the Internet Archive (IA) last year, the free online library that strives to keep growing online access to books recently shrank by about 500,000 titles.

IA reported in a blog post this month that publishers abruptly forcing these takedowns triggered a "devastating loss" for readers who depend on IA to access books that are otherwise impossible or difficult to access.

To restore access, IA is now appealing, hoping to reverse the prior court's decision by convincing the US Court of Appeals in the Second Circuit that IA's controlled digital lending of its physical books should be considered fair use under copyright law."

Sunday, June 30, 2024

Tech companies battle content creators over use of copyrighted material to train AI models; The Canadian Press via CBC, June 30, 2024

Anja Karadeglija , The Canadian Press via CBC; Tech companies battle content creators over use of copyrighted material to train AI models

"Canadian creators and publishers want the government to do something about the unauthorized and usually unreported use of their content to train generative artificial intelligence systems.

But AI companies maintain that using the material to train their systems doesn't violate copyright, and say limiting its use would stymie the development of AI in Canada.

The two sides are making their cases in recently published submissions to a consultation on copyright and AI being undertaken by the federal government as it considers how Canada's copyright laws should address the emergence of generative AI systems like OpenAI's ChatGPT."

Tuesday, June 4, 2024

Google’s A.I. Search Leaves Publishers Scrambling; The New York Times, June 1, 2024

 Nico Grant and , The New York Times; Google’s A.I. Search Leaves Publishers Scrambling

"In May, Google announced that the A.I.-generated summaries, which compile content from news sites and blogs on the topic being searched, would be made available to everyone in the United States. And that change has Mr. Pine and many other publishing executives worried that the paragraphs pose a big danger to their brittle business model, by sharply reducing the amount of traffic to their sites from Google.

“It potentially chokes off the original creators of the content,” Mr. Pine said. The feature, AI Overviews, felt like another step toward generative A.I. replacing “the publications that they have cannibalized,” he added."

Thursday, May 23, 2024

OpenAI Strikes a Deal to License News Corp Content; The New York Times, May 22, 2024

Katie Robertson , The New York Times; OpenAI Strikes a Deal to License News Corp Content

"News Corp, the Murdoch-owned empire of publications like The Wall Street Journal and The New York Post, announced on Wednesday that it had agreed to a deal with OpenAI to share its content to train and service artificial intelligence chatbots.

News Corp said the multiyear agreement would allow OpenAI to use current and archived news content from News Corp’s major news outlets, including brands in the United States, United Kingdom and Australia as well as MarketWatch and Barron’s. The agreement does not include content from News Corp’s other businesses, such as its digital real estate services or HarperCollins...

Many publishers have worried about the threat to their business posed by generative A.I., which uses copyrighted content to train its models and service its chatbots."

Wednesday, March 27, 2024

Amicus Briefs Filed in Internet Archive Copyright Case; Publishers Weekly, March 25, 2024

Andrew Albanese , Publishers Weekly; Amicus Briefs Filed in Internet Archive Copyright Case

"Internet Archive lawyers filed their principal appeal brief on December 15, and 11 amicus briefs were filed in support of the Internet Archive a week later, in December, representing librarians and library associations, authors, public advocacy groups, law professors, and IP scholars, although some of the IA amicus briefs are presented as neutral.

The briefs are the latest development in the long-running copyright infringement case and come a year after a ruling by judge John G. Koeltl on March 24, 2023 that emphatically rejected the IA’s fair use defense, finding the scanning and lending of print library books under a protocol known as “controlled digital lending” to be copyright infringement.

The Internet Archive’s reply brief is now due on April 19, and oral arguments are expected to be set for this fall."

Thursday, October 19, 2023

AI is learning from stolen intellectual property. It needs to stop.; The Washington Post, October 19, 2023

William D. Cohan , The Washington Post; AI is learning from stolen intellectual property. It needs to stop.

"The other day someone sent me the searchable database published by Atlantic magazine of more than 191,000 e-books that have been used to train the generative AI systems being developed by Meta, Bloomberg and others. It turns out that four of my seven books are in the data set, called Books3. Whoa.

Not only did I not give permission for my books to be used to generate AI products, but I also wasn’t even consulted about it. I had no idea this was happening. Neither did my publishers, Penguin Random House (for three of the books) and Macmillan (for the other one). Neither my publishers nor I were compensated for use of my intellectual property. Books3 just scraped the content away for free, with Meta et al. profiting merrily along the way. And Books3 is just one of many pirated collections being used for this purpose...

This is wholly unacceptable behavior. Our books are copyrighted material, not free fodder for wealthy companies to use as they see fit, without permission or compensation. Many, many hours of serious research, creative angst and plain old hard work go into writing and publishing a book, and few writers are compensated like professional athletes, Hollywood actors or Wall Street investment bankers. Stealing our intellectual property hurts."

Thursday, September 21, 2023

Publishers settle copyright infringement lawsuit with ResearchGate; Chemistry World, September 18, 2023

, Chemistry World ; Publishers settle copyright infringement lawsuit with ResearchGate

"Lisa Janicke Hinchliffe, a librarian and professor of information science at the University of Illinois Urbana-Champaign, says the settlement agreement ‘signals that ResearchGate has completed its journey from disrupter to partner within the scholarly communications ecosystem’. She notes that Elsevier and ACS have been using ResearchGate’s content blocking technology since at least early 2022, which indicates that ‘a more collaborative relationship’ has been in development for some time."

Tuesday, September 19, 2023

Bizarre AI-generated products are in stores. Here’s how to avoid them.; The Washington Post, September 18, 2023

 , The Washington Post; Bizarre AI-generated products are in stores. Here’s how to avoid them.

"Copyright and intellectual property issues around AI are still in the air...

The Authors Guild, which represents many authors whose work has been used to train AI tools, is asking for legislation and pushing companies to disclose when a book is written by AI...

“We see it as consumer protection, but it’s also a way to insulate the book marketplace because otherwise, you’ll just see an influx of AI-generated content on a platform like Kindle,” said Mary Rasenberger, chief executive of the Authors Guild. “It will take away from the market [demand] for human creative works.”

Rasenberger said that she doesn’t think AI can be held off forever and even sees a place for it as a useful tool for writers. The guild’s goal is to make sure AI is regulated, licensed and legitimate, with money going back to authors, she said."

Monday, September 18, 2023

Four large US publishers sue ‘shadow library’ for alleged copyright infringement; The Guardian, September 15, 2023

, The Guardian ; Four large US publishers sue ‘shadow library’ for alleged copyright infringement

"Four leading US publishers have sued an online “shadow library” that allows visitors to download textbooks and other copyrighted materials free.

Cengage, Macmillan Learning, McGraw Hill and Pearson Education filed the suit against Library Genesis, also known as LibGen, in Manhattan federal court, citing “extensive violations” of copyright law.

LibGen operates a collection of different domains that allow users to search for and download pdf versions of books. The suit, filed on Thursday, said LibGen holds more than 20,000 files published by the four suing companies."

Tuesday, September 12, 2023

Internet Archive Files Appeal in Copyright Infringement Case; Publishers Weekly, September 11, 2023

 Andrew Albanese, Publishers Weekly ; Internet Archive Files Appeal in Copyright Infringement Case

"As expected, the Internet Archive this week submitted its appeal in Hachette v. Internet Archive, the closely-watched copyright case involving the scanning and digital lending of library books.

In a brief notice filed with the court, IA lawyers are seeking review by the Second Circuit court of appeals in New York of the "August 11, 2023 Judgment and Permanent Injunction; the March 24, 2023 Opinion and Order Granting Plaintiffs’ Motion for Summary Judgment and Denying Defendant’s Motion for Summary Judgment; and from any and all orders, rulings, findings, and/or conclusions adverse to Defendant Internet Archive."

The notice of appeal comes right at the 30-day deadline—a month to the day after judge John G. Koeltl approved and entered a negotiated consent judgment in the case which declared the IA's scanning and lending program to be copyright infringement, as well as a permanent injunctionthat, among its provisions, bars the IA from lending unauthorized scans of the plaintiffs' in-copyright, commercially available books that are available in digital editions."