Colloquium: How to read 15 million books in one sitting; Bill Schilit , Google Research; 2/3/10 4 PM, Carnegie-Mellon University, Newell Simon Hall 1305 (Michael Mauldin Auditorium):
"Abstract
Scanning books, magazines, and newspapers is widespread because people believe a great deal of the world's information still resides off-line. In general after works are scanned they are OCR'ed, indexed for search and processed to add links. In this talk I will describe a new approach to automatically add links by mining repeated passages. This technique connects elements that are semantically rich, so strong relations are made. Moreover, link targets point within rather than to the entire work, facilitating navigation. Our system has been run on a digital library of many millions of books (Google Book Search), has been used by thousands of people, and has generated the world's largest collection of quotations. I will also present a follow-on project based on the theory that authors copy passages from book to book because these quotations capture an idea particularly well: Jefferson on liberty; Stanton on women's rights; and Gibson on cyberpunk. These projects suggest that mining quotations for links and ideas is an important mechanism for understanding the knowledge contained in books.
(This work is in collaboration with Okan Kolak, Google Research and Google Book Search.)*"
http://www.hcii.cmu.edu/news/seminar/how-read-15-million-books-one-sitting-or-mining-hypertext-quotations-and-ideas-very-lar
Issues and developments related to IP, AI, and OM, examined in the IP and tech ethics graduate courses I teach at the University of Pittsburgh School of Computing and Information. My Bloomsbury book "Ethics, Information, and Technology", coming in Summer 2025, includes major chapters on IP, AI, OM, and other emerging technologies (IoT, drones, robots, autonomous vehicles, VR/AR). Kip Currier, PhD, JD
Monday, February 1, 2010
Colloquium: How to read 15 million books in one sitting; Bill Schilit , Google Research, 2/3/10 4 PM, Carnegie-Mellon University
Labels:
Bill Schilit,
data mining,
Google,
scanning print materials
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment