Showing posts with label Caselaw Access Project. Show all posts
Showing posts with label Caselaw Access Project. Show all posts

Thursday, December 26, 2024

Harvard’s Library Innovation Lab launches Institutional Data Initiative; Harvard Law Today, December 12, 2024

Scott Young , Harvard Law Today; Harvard’s Library Innovation Lab launches Institutional Data Initiative

"At the Institutional Data Initiative (IDI), a new program hosted within the Harvard Law School Library, efforts are already underway to expand and enhance the data resources available for AI training. At the initiative’s public launch on Dec. 12, Library Innovation Lab faculty director, Jonathan Zittrain ’95, and IDI executive director, Greg Leppert, announced plans to expand the availability of public domain data from knowledge institutions — including the text of nearly one million books scanned at Harvard Library — to train AI models...

Harvard Law Today: What is the Institutional Data Initiative?

Greg LeppertOur work at the Institutional Data Initiative is focused on finding ways to improve the accessibility of institutional data for all uses, artificial intelligence among them. Harvard Law School Library is a tremendous repository of public domain books, briefs, research papers, and so on. Regardless of how this information was initially memorialized — hardcover, softcover, parchment, etc. — a considerable amount has been converted into digital form. At the IDI, we are working to ensure these large data sets of public domain works, like the ones from the Law School library that comprise the Caselaw Access Project, are made open and accessible, especially for AI training. Harvard is not alone in terms of the scale and quality of its data; similar sets exist throughout our academic institutions and public libraries. AI systems are only as diverse as the data on which they’re trained, and these public domain data sets ought to be part of a healthy diet for future AI training.

HLT: What problem is the Institutional Data Initiative working to solve?

LeppertAs it stands, the data being used to train AI is often limited in terms of scale, scope, quality, and integrity. Various groups and perspectives are massively underrepresented in the data currently being used to train AI. As things stand, outliers will not be served by AI as well as they should be, and otherwise could be, by the inclusion of that underrepresented data. The country of Iceland, for example, undertook a national, government-led effort to make materials from their national libraries available for AI applications. That is because they were seriously concerned the Icelandic language and culture would not be represented in AI models. We are also working towards reaffirming Harvard, and other institutions, as the stewards of their collections. The proliferation of training sets based on public domain materials has been encouraging to see, but it’s important that this doesn’t leave the material vulnerable to critical omissions or alterations. For centuries, knowledge institutions have served as stewards of information for the purpose of promoting the public good and furthering the representation of diverse ideas, cultural groups, and ways of seeing the world. So, we believe these institutions are the exact kind of sources for AI training data if we want to optimize its ability to serve humanity. As it stands today, there is significant room for improvement."

Friday, November 9, 2018

In Favor of the Caselaw Access Project; The Harvard Crimson, November 7, 2018

The Crimson Editorial Board, The Harvard Crimson; In Favor of the Caselaw Access Project

"We hope that researchers will use these court opinions to further advance academic scholarship in this area. In particular, we hope that computer programmers are able to take full advantage of this repository of information. As Ziegler noted, no lawyer will be able to take full advantage of the millions of pages in the database, but computers have an advantage in this regard. Like Ziegler, we are hopeful that researchers using the database will be able to learn more about less understood aspects of the legal system — such as how courts influence each other and deal with disagreements. Those big-picture questions could not have been answered as well without the information provided by this new database.

This project is a resounding success for the Harvard Library, which happens also to be looking for a new leader. We hope that the person hired for the job will be similarly committed to projects that increase access to information — a key value that all who work in higher education should hold near and dear. In addition to maintaining the vast amounts of histories and stories already in the system, Harvard’s libraries should seek to illuminate content that may have been erased or obscured. There is always more to learn."

Thursday, November 8, 2018

Harvard Converts Millions of Legal Documents into Open Data; Government Technology, November 2, 2018

Theo Douglas, Government Technology; Harvard Converts Millions of Legal Documents into Open Data

[Kip Currier: Discovered the recent launch of this impressive Harvard University-anchored Caselaw Access Project, while updating a lecture for next week on Open Data.

The free site provides access to highly technical data, full text cases, and even "quirky" but fascinating legal info...like the site's Gallery, highlighting instances in which "witchcraft" is mentioned in legal cases throughout the U.S.

Check out this new site...and spread the word about it!] 


"A new free website spearheaded by the Library Innovation Lab at the Harvard Law School makes available nearly 6.5 million state and federal cases dating from the 1600s to earlier this year, in an initiative that could alter and inform the future availability of similar areas of public-sector big data.

Led by the Lab, which was founded in 2010 as an arena for experimentation and exploration into expanding the role of libraries in the online era, the Caselaw Access Project went live Oct. 29 after five years of discussions, planning and digitization of roughly 100,000 pages per day over two years.

The effort was inspired by the Google Books Project; the Free Law Project, a California 501(c)(3) that provides free, public online access to primary legal sources, including so-called “slip opinions,” or early but nearly final versions of legal opinions; and the Legal Information Institute, a nonprofit service of Cornell University that provides free online access to key legal materials."