Jon Stokes, ArsTechnica.com; Google's count of 130 million books is probably bunk:
""After we exclude serials, we can finally count all the books in the world," wrote Google's Leonid Taycher in a GBS blog post. "There are 129,864,880 of them. At least until Sunday."
It's a large, official-sounding number, and the explanation for how Google arrived at it involves a number of acronyms and terms that will be unfamiliar to most of those who read the post. It's also quite likely to be complete bunk...
But the problem with Google's count, as is clear from the GBS count post itself, is that GBS's metadata collection is a riddled with errors of every sort. Or, as linguist and GBS critic Goeff Nunberg put it last year in a blog post, Google's metadata is "train wreck: a mish-mash wrapped in a muddle wrapped in a mess."
Indeed, a simple Google search for "google books metadata" (sans quotes) will turn up mostly criticisms and caterwauling by dismayed linguists, librarians, and other scholars at the terrible state of Google's metadata. Erroneous dates are pervasive, to the point that you can find many GBS references to historical figures and technologies in books that Google dates to well before the people or technologies existed. The classifications are a mess, and Nunberg's presentation points out that the first 10 classifications for Walt Whitman's "Leaves of Grass" classify it as Juvenile Nonfiction, Poetry, Fiction, Literary Criticism, Biography & Autobiography, Counterfeits and Counterfeiting. Then there are authors that are missing or misattributed, and titles that bear no relation to the linked work."
http://arstechnica.com/science/news/2010/08/googles-count-of-130-million-books-is-probably-bunk.ars
Issues and developments related to IP, AI, and OM, examined in the IP and tech ethics graduate courses I teach at the University of Pittsburgh School of Computing and Information. My Bloomsbury book "Ethics, Information, and Technology", coming in Summer 2025, includes major chapters on IP, AI, OM, and other emerging technologies (IoT, drones, robots, autonomous vehicles, VR/AR). Kip Currier, PhD, JD
Showing posts with label Geoff Nunberg. Show all posts
Showing posts with label Geoff Nunberg. Show all posts
Tuesday, August 24, 2010
Saturday, September 5, 2009
The Cookie Before Dinner; Open Book Aliance, 8/31/09
Peter Brantley via Open Book Alliance; The Cookie Before Dinner:
"Last Friday, I was fortunate to participate in an event on the Google Book settlement and the Future of Information Access. Hosted by the UC Berkeley School of Information, the event brought together a couple hundred academic, legal, and industry minds to discuss the promise and the pitfalls of the controversial settlement proposal between Google, the Authors’ Guild, and the Association of American Publishers.
My takeaway from the panels and hallway conversations is that the academic and scholarly community – among the parties who would be most affected by this settlement – are fairly critical of the settlement proposal in its current form.
Four issues in particular kept cropping up during the panels – the utility of the service that Google says it will deliver; the diminished competition that will occur as a result of the de facto exclusivity offered by the settlement; significant privacy issues that are yet unanswered by Google; and the quality of the books and their descriptive metadata that Google intends to offer.
On the last point, Geoff Nunberg from the School of Information gave what may have been the most interesting and entertaining presentation of the day, highlighting a sampling of the errors in Google’s book scanning efforts to date. In his words, “GBS (Google Book Settlement) metadata are awful.”
Media coverage of the event highlighted the point that many in the academic community seem to agree on – while the digitization of books can offer tremendous benefits to all, there are better, fairer ways to go about making that future a reality. We don’t have to grab the cookie that’s offered to us before dinner."
http://www.openbookalliance.org/2009/08/the-cookie-before-dinner/
"Last Friday, I was fortunate to participate in an event on the Google Book settlement and the Future of Information Access. Hosted by the UC Berkeley School of Information, the event brought together a couple hundred academic, legal, and industry minds to discuss the promise and the pitfalls of the controversial settlement proposal between Google, the Authors’ Guild, and the Association of American Publishers.
My takeaway from the panels and hallway conversations is that the academic and scholarly community – among the parties who would be most affected by this settlement – are fairly critical of the settlement proposal in its current form.
Four issues in particular kept cropping up during the panels – the utility of the service that Google says it will deliver; the diminished competition that will occur as a result of the de facto exclusivity offered by the settlement; significant privacy issues that are yet unanswered by Google; and the quality of the books and their descriptive metadata that Google intends to offer.
On the last point, Geoff Nunberg from the School of Information gave what may have been the most interesting and entertaining presentation of the day, highlighting a sampling of the errors in Google’s book scanning efforts to date. In his words, “GBS (Google Book Settlement) metadata are awful.”
Media coverage of the event highlighted the point that many in the academic community seem to agree on – while the digitization of books can offer tremendous benefits to all, there are better, fairer ways to go about making that future a reality. We don’t have to grab the cookie that’s offered to us before dinner."
http://www.openbookalliance.org/2009/08/the-cookie-before-dinner/
Friday, August 28, 2009
Google Book Search? Try Google Library; CBS News, 8/27/09
CNet's Tom Krazit via CBS News; Google Book Search? Try Google Library:
Plan to Bring Millions of Books Online Raises Concerns over Privacy, Quality and Motive
"There's a sense among several of those planning to speak at Friday's conference that an Internet corporation--even one sworn to "do no evil"--does not necessarily share the same values and principles that librarians rabidly defend. And left unsaid, but by no means absent, is the growing scrutiny paid this year to Google's dominant position in the Internet search market and how that power squares with Google Books and the publishing industry...
Universities do have an alternative in the HathiTrust, a digital library project that counts UC Berkeley and the University of Michigan--also a close partner of Google's--among its partners. That service lacks the scope of what Google is potentially entitled to scan, but it curates the material in a fashion that's better suited to the needs of the academic community.
That's good, because at the moment, Google Book Search is almost laughably unusable for serious research, UC Berkeley's Nunberg said. For example, he pointed out that the Charles Dickens classic "A Tale of Two Cities" is listed in Google Book Search as having been published in 1800; Dickens was born in 1812.
Nunberg plans to speak out on the quality issues with Google Book Search, although he readily concedes that the product was not designed for the needs of academics and scholars. But that only underscores the point: if Google Book Search is the only way to obtain a digital copy of a book 100 years into the future, scholars will have to depend on it for research, he said...
"There's a lot of questions about how they will balance (their) mandate as a for-profit corporation and their mission to provide universal access to information," [ALA's Angela] Maycock said. If it really wants to make the controversy over this settlement go away, Google needs to embrace "the ethical framework that libraries operate under," she said."
http://www.cbsnews.com/stories/2009/08/27/tech/cnettechnews/main5269257.shtml
Plan to Bring Millions of Books Online Raises Concerns over Privacy, Quality and Motive
"There's a sense among several of those planning to speak at Friday's conference that an Internet corporation--even one sworn to "do no evil"--does not necessarily share the same values and principles that librarians rabidly defend. And left unsaid, but by no means absent, is the growing scrutiny paid this year to Google's dominant position in the Internet search market and how that power squares with Google Books and the publishing industry...
Universities do have an alternative in the HathiTrust, a digital library project that counts UC Berkeley and the University of Michigan--also a close partner of Google's--among its partners. That service lacks the scope of what Google is potentially entitled to scan, but it curates the material in a fashion that's better suited to the needs of the academic community.
That's good, because at the moment, Google Book Search is almost laughably unusable for serious research, UC Berkeley's Nunberg said. For example, he pointed out that the Charles Dickens classic "A Tale of Two Cities" is listed in Google Book Search as having been published in 1800; Dickens was born in 1812.
Nunberg plans to speak out on the quality issues with Google Book Search, although he readily concedes that the product was not designed for the needs of academics and scholars. But that only underscores the point: if Google Book Search is the only way to obtain a digital copy of a book 100 years into the future, scholars will have to depend on it for research, he said...
"There's a lot of questions about how they will balance (their) mandate as a for-profit corporation and their mission to provide universal access to information," [ALA's Angela] Maycock said. If it really wants to make the controversy over this settlement go away, Google needs to embrace "the ethical framework that libraries operate under," she said."
http://www.cbsnews.com/stories/2009/08/27/tech/cnettechnews/main5269257.shtml
Librarians apply scrutiny to Google Books at Berkeley con; ZDNet Government, 8/27/09
Richard Koman via ZDNet Government; Librarians apply scrutiny to Google Books at Berkeley con:
"If you’re in the Bay Area and you want a full day of wonky debate, check out UC Berkeley’s Google Books Conference. It features panels on how the Google Books settlement affect data mining, privacy, information quality and public access.
The conference comes hard on the heels of the formation of the Open Book Alliance, an organization driven by the Internet Archive and including Amazon, Yahoo and Microsoft, as well as library and small publishing groups among its members. Most of the speakers are opposed to the deal but Google’s Tom [sic] Clancy will be there to make the company’s argument....
But if Google is the last library, as Berkeley linguist Geoff Nunberg says, it’s a pretty bad one. That means serious library science must be applied to the online collection before we should outsource the history of human (or at least Western) knowledge to Google:
Google Book Search is almost laughably unusable for serious research, UC Berkeley’s Nunberg said. For example, he pointed out that the Charles Dickens classic “A Tale of Two Cities” is listed in Google Book Search as having been published in 1800; Dickens was born in 1812."
http://government.zdnet.com/?p=5309
"If you’re in the Bay Area and you want a full day of wonky debate, check out UC Berkeley’s Google Books Conference. It features panels on how the Google Books settlement affect data mining, privacy, information quality and public access.
The conference comes hard on the heels of the formation of the Open Book Alliance, an organization driven by the Internet Archive and including Amazon, Yahoo and Microsoft, as well as library and small publishing groups among its members. Most of the speakers are opposed to the deal but Google’s Tom [sic] Clancy will be there to make the company’s argument....
But if Google is the last library, as Berkeley linguist Geoff Nunberg says, it’s a pretty bad one. That means serious library science must be applied to the online collection before we should outsource the history of human (or at least Western) knowledge to Google:
Google Book Search is almost laughably unusable for serious research, UC Berkeley’s Nunberg said. For example, he pointed out that the Charles Dickens classic “A Tale of Two Cities” is listed in Google Book Search as having been published in 1800; Dickens was born in 1812."
http://government.zdnet.com/?p=5309
Google Book Search - Is it The Last Library?; Register, 8/29/09
Cate Metz via Register; Google Book Search - Is it The Last Library?:
"Geoff Nunberg, one of America's leading linguistics researchers, laid this rather ominous tag on Google's controversial book-scanning project amidst an amusingly-heated debate this afternoon on the campus of the University of California, Berkeley.
"This is likely to be The Last Library," Nunberg said during a University conference dedicated to Google Book Search and the company's accompanying $125m settlement with US authors and publishers. "Nobody is very likely to scan these books again. The cost of scanning isn't going to come down. There's no Moore's Law for scanning.
"We don't know who's going to be running these files 100 years from now. It may be Google. It may be News Corp. It may WalMart. But we can say with some certainty that 100 years from now, these are the very files scholars will be using."...
Predictably, Google Book Search engineering lead Dan Clancy takes issue with The Last Library characterization. He acknowledges that some of the works Google has scanned will never be scanned again. But he's adamant that although Google has a 10-million-book head start - and a monopoly-building boondoggle of a settlement with authors and publishers - others will compete.
"I don't view Google Book Search as the one and only library," he said. "I don't think it should be and I don't think it will be - in part because, remember, a library is about accessing information, not just accessing books. Libraries were created because books were where information was in the past.
"Libraries are about information, and...Google is not the only book-scanning activity in existence today. There will continue to be other activities. And the internet provides all sorts of information that are linked together in all sorts of ways."...
Though he wouldn't say how much Google has spent scanning books, Clancy admitted it wasn't cheap. "It's a lot," he said. "If this was just tens of millions of dollars, we wouldn't all be siting here debating this. Microsoft would have kept scanning. And there would be much more incentive to do this.""
http://www.theregister.co.uk/2009/08/29/google_books/
"Geoff Nunberg, one of America's leading linguistics researchers, laid this rather ominous tag on Google's controversial book-scanning project amidst an amusingly-heated debate this afternoon on the campus of the University of California, Berkeley.
"This is likely to be The Last Library," Nunberg said during a University conference dedicated to Google Book Search and the company's accompanying $125m settlement with US authors and publishers. "Nobody is very likely to scan these books again. The cost of scanning isn't going to come down. There's no Moore's Law for scanning.
"We don't know who's going to be running these files 100 years from now. It may be Google. It may be News Corp. It may WalMart. But we can say with some certainty that 100 years from now, these are the very files scholars will be using."...
Predictably, Google Book Search engineering lead Dan Clancy takes issue with The Last Library characterization. He acknowledges that some of the works Google has scanned will never be scanned again. But he's adamant that although Google has a 10-million-book head start - and a monopoly-building boondoggle of a settlement with authors and publishers - others will compete.
"I don't view Google Book Search as the one and only library," he said. "I don't think it should be and I don't think it will be - in part because, remember, a library is about accessing information, not just accessing books. Libraries were created because books were where information was in the past.
"Libraries are about information, and...Google is not the only book-scanning activity in existence today. There will continue to be other activities. And the internet provides all sorts of information that are linked together in all sorts of ways."...
Though he wouldn't say how much Google has spent scanning books, Clancy admitted it wasn't cheap. "It's a lot," he said. "If this was just tens of millions of dollars, we wouldn't all be siting here debating this. Microsoft would have kept scanning. And there would be much more incentive to do this.""
http://www.theregister.co.uk/2009/08/29/google_books/
Subscribe to:
Posts (Atom)