Intellectual Property (IP), Artificial Intelligence (AI), Open Movements (OM) : mining Google's scanned books

Tuesday, December 10, 2013

In a Scoreboard of Words, a Cultural Guide; New York Times, 12/7/13

Natasha Singer, New York Times; In a Scoreboard of Words, a Cultural Guide: "“We wanted to create a scientific measuring instrument, something like a telescope, but instead of pointing it at a star, you point it at human culture,” Mr. Michel recalls. The pair approached Peter Norvig, the director of research at Google, with a pie-in-the-sky proposal: to mine Google’s library of digital books so they could apply automated quantitative analysis to the typically qualitative study of history. According to the book, Mr. Norvig was intrigued. But he challenged the graduate students by asking how such a system could work without violating copyright. After some thought, Mr. Aiden and Mr. Michel proposed creating a kind of “shadow data set” that would contain frequency statistics on the most common words or phrases in the digitized books — but would not contain the books’ actual texts. The pair developed a prototype interface, called Bookworm, to prove their idea. Soon after, engineers at Google, including Jon Orwant and Will Brockman, built a public, web-based version of the tool."