Showing posts with label APIs. Show all posts
Showing posts with label APIs. Show all posts

Thursday, March 7, 2024

Introducing CopyrightCatcher, the first Copyright Detection API for LLMs; Patronus AI, March 6, 2024

Patronus AI; Introducing CopyrightCatcher, thefirst Copyright Detection API for LLMs

"Managing risks from unintended copyright infringement in LLM outputs should be a central focus for companies deploying LLMs in production.

  • On an adversarial copyright test designed by Patronus AI researchers, we found that state-of-the-art LLMs generate copyrighted content at an alarmingly high rate 😱
  • OpenAI’s GPT-4 produced copyrighted content on 44% of the prompts.
  • Mistral’s Mixtral-8x7B-Instruct-v0.1 produced copyrighted content on 22% of the prompts.
  • Anthropic’s Claude-2.1 produced copyrighted content on 8% of the prompts.
  • Meta’s Llama-2-70b-chat produced copyrighted content on 10% of the prompts.
  • Check out CopyrightCatcher, our solution to detect potential copyright violations in LLMs. Here’s the public demo, with open source model inference powered by Databricks Foundation Model APIs. 🔥

LLM training data often contains copyrighted works, and it is pretty easy to get an LLM to generate exact reproductions from these texts1. It is critical to catch these reproductions, since they pose significant legal and reputational risks for companies that build and use LLMs in production systems2. OpenAI, Anthropic, and Microsoft have all faced copyright lawsuits on LLM generations from authors3, music publishers4, and more recently, the New York Times5.

To check whether LLMs respond to your prompts with copyrighted text, you can use CopyrightCatcher. It detects when LLMs generate exact reproductions of content from text sources like books, and highlights any copyrighted text in LLM outputs. Check out our public CopyrightCatcher demo here!

Wednesday, February 19, 2020

Oracle Files Response To Google and API Copyright - We Are All Doomed; i-Programmer, February 17, 2020

Mike James, i-Programmer; Oracle Files Response To Google and API Copyright - We Are All Doomed

"If I invent an API, of course I want it to be copyright. If I use an API then the last thing I want is for it to be copyright."

Tuesday, March 27, 2018

Google loses Android battle and could owe Oracle billions of dollars; CNN Money, March 27, 2018

Danielle Wiener-Bronner, CNN Money; Google loses Android battle and could owe Oracle billions of dollars

"Google isn't the only company that stands to lose from this decision. Many others rely on open-source software to develop their own platforms. Tuesday's ruling means that some will either have pay to license certain software or develop their own from scratch.

"The decision is going to create a significant shift in how software is developed worldwide," Carani said. "It really means that copyright in this context has teeth."

"Sometimes free is not really free," he added."

Thursday, January 12, 2017

There is no shortage of open data. The question is, is anyone using it?; Computer Weekly, 1/9/17

Jonathan Stoneman, Computer Weekly; There is no shortage of open data. The question is, is anyone using it?

"Why publishing data is not enough
So there is no shortage of open data – but is anyone using it? The UK government’s data portal, data.gov.uk, currently shows 36,552 published datasets available, and just over 30,000 of those have an open government licence. There are 6,444 more without a licence and, intriguingly, a further 3,664 are listed as “unpublished”.
Some 1,401 government departments, including local government and agencies, are listed as “publishers”. Two million datasets were downloaded in 2016, but 11,481 – 31% of the whole collection – were not, not even once.
The UK government sees publication as a measure in itself."

Tuesday, May 31, 2016

Why Google’s fair use victory over Oracle matters; Guardian, 5/31/16

Pamela Samuelson, Guardian; Why Google’s fair use victory over Oracle matters:
"Further cascades of liability could have happened outside the Android ecosystem. An Oracle victory in the Google case would have emboldened other software firms with valuable APIs to become more aggressive in challenging unlicensed uses of those APIs. Someone wanting to develop a program to run on another firm’s platform must use that platform’s API to enable the second program to interoperate with the platform. (Think of an API as an information equivalent to the plug and socket configurations that are necessary for physical devices to interoperate with the electrical grid.) If the second program isn’t configured to send and receive information in the precise way that the first program’s API specifies, it just won’t work at all.
If the developer of an API owns copyright in that API, it can say no to any unlicensed use of it. Or it can condition its willingness to license use of the API on high royalties or impose restrictions on the other firm’s development (such as forbidding adaptation of the same program to run on other platforms).
Since 1992, courts have overwhelmingly rejected copyright claims in program interface specifications. These rulings are consistent with the prevailing norm in the computing industry since its early days: that it is OK to use another firm’s API as long as the second firm reimplements the API in independently written code. Over the past two decades, the software industry has thrived because the court rulings converged with industry norms that allow innovative software developers to build upon existing programs and platforms to offer consumers many choices of products for smart phones and other computing devices."

Monday, May 30, 2016

Why Google's victory in a copyright fight with Oracle is a big deal; Vox, 5/26/16

Timothy B. Lee, Vox; Why Google's victory in a copyright fight with Oracle is a big deal:
"Google's version of Java didn't reuse any code from Oracle's version. But to ensure compatibility, Google's version used functions with the same names and functionality.
This practice was widely viewed as legal within the software world at the time Google did it, but Oracle sued, arguing that this was copyright infringement. Oracle argued that the list of Java function names and features constitutes a creative work, and that Google infringed Oracle's copyright when it included functions with the same names and features.
Google argued that the list of function names, known as an application programming interface (API), was not protected by copyright law.
Google's defenders pointed to a landmark 1995 ruling in which an appeals court held that the software company Borland had not infringed copyright when it created a spreadsheet program whose menus were organized in the same way as the menus in the more popular spreadsheet Lotus 1-2-3.
The court held that the order of Lotus 1-2-3 menu items was an uncopyrightable "method of operation." And it concluded that giving Lotus exclusive ownership over its menu structure would harm the public...
Google believed that its own copying was directly analogous to what Borland had done. There were thousands of programmers with expertise in writing Java programs. By designing its platform to respond to the same set of programming commands as Oracle's Java system, Google allowed Java programmers to become Android programmers with minimal training — just as Borland's decision to copy Lotus's menu structure avoided unnecessary training for seasoned Lotus 1-2-3 users."

Friday, May 27, 2016

Google Prevails as Jury Rebuffs Oracle in Code Copyright Case; New York Times, 5/26/16

Nick Wingfield and Quentin hardy, New York Times; Google Prevails as Jury Rebuffs Oracle in Code Copyright Case:
"Some lawyers cautioned against viewing the verdict as a green light for the type of software development Google performed, saying that the earlier federal appeals court decision validated the idea that A.P.I.s can be copyrighted.
“I don’t think the industry can sit back and rely on this decision and exhale and say these things won’t be protected,” said Christopher Carani, a lawyer at McAndrews, Held & Malloy. “I think what you’re still going to see is a lot more attention paid to securing approval to use other copyrights before the fact.”
John Bergmayer, a senior staff attorney at Public Knowledge, a consumer rights group, cheered the verdict in a statement, but said he remained troubled by the implications of the earlier court decision. “Other courts of appeal should reject the Federal Circuit’s mistaken finding of copyrightability,” he said. “For now, though, the jury’s verdict is a welcome dose of common sense.”"

Wednesday, January 6, 2016

New York Public Library Invites a Deep Digital Dive; New York Times, 1/6/16

Jennifer Schuessler, New York Times; New York Public Library Invites a Deep Digital Dive:
"But the game is what you might call a marketing teaser for a major redistribution of property, digitally speaking: the release of more than 180,000 photographs, postcards, maps and other public-domain items from the library’s special collections in downloadable high-resolution files — along with an invitation to users to grab them and do with them whatever they please.
Digitization has been all the rage over the past decade, as libraries, museums and other institutions have scanned millions of items and posted them online. But the library’s initiative (nypl.org/publicdomain), which goes live on Wednesday, goes beyond the practical questions of how and what to digitize to the deeper one of what happens next...
A growing number of institutions have been rallying under the banner of “open content.” While the library’s new initiative represents one of the largest releases of visually rich material since the Rijksmuseum in Amsterdam began making more than 200,000 works available in high-quality scans free of charge in 2012, it’s notable for more than its size.
“It’s not just a data dump,” said Dan Cohen, the executive director of the Digital Public Library of America, a consortium that offers one-stop access to digitized holdings from more than 1,300 institutions.
The New York Public has “really been thinking about how they can get others to use this material,” Mr. Cohen continued. “It’s a next step that I would like to see more institutions take.”
Most items in the public-domain release have already been visible at the library’s digital collections portal. The difference is that the highest-quality files will now be available for free and immediate download, along with the programming interfaces, known as APIs, that allow developers to use them more easily."