Compact System

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Thursday, 16 December 2010

Find out what’s in a word, or five, with the Google Books Ngram Viewer

Posted on 11:00 by Unknown
Posted by Jon Orwant, Engineering Manager, Google Books

[Cross-posted from the Google Books Blog]

Scholars interested in topics such as philosophy, religion, politics, art and language have employed qualitative approaches such as literary and critical analysis with great success. As more of the world’s literature becomes available online, it’s increasingly possible to apply quantitative methods to complement that research. So today Will Brockman and I are happy to announce a new visualization tool called the Google Books Ngram Viewer, available on Google Labs. We’re also making the datasets backing the Ngram Viewer, produced by Matthew Gray and intern Yuan K. Shen, freely downloadable so that scholars will be able to create replicable experiments in the style of traditional scientific discovery.

Comparing instances of [flute], [guitar], [drum] and [trumpet] (
blue, red, yellow and green respectively)
in English literature from 1750 to 2008

Since 2004, Google has digitized more than 15 million books worldwide. The datasets we’re making available today to further humanities research are based on a subset of that corpus, weighing in at 500 billion words from 5.2 million books in Chinese, English, French, German, Russian, and Spanish. The datasets contain phrases of up to five words with counts of how often they occurred in each year.

These datasets were the basis of a research project led by Harvard University’s Jean-Baptiste Michel and Erez Lieberman Aiden published today in Science and coauthored by several Googlers. Their work provides several examples of how quantitative methods can provide insights into topics as diverse as the spread of innovations, the effects of youth and profession on fame, and trends in censorship.

The Ngram Viewer lets you graph and compare phrases from these datasets over time, showing how their usage has waxed and waned over the years. One of the advantages of having data online is that it lowers the barrier to serendipity: you can stumble across something in these 500 billion words and be the first person ever to make that discovery. Below I’ve listed a few interesting queries to pique your interest:

World War I, Great War
child care, nursery school, kindergarten
fax, phone, email
look before you leap, he who hesitates is lost
virus, bacteria
tofu, hot dog
burnt, burned
flute, guitar, trumpet, drum
Paris, London, New York, Boston, Rome
laptop, mainframe, microcomputer, minicomputer
fry, bake, grill, roast
George Washington, Thomas Jefferson, Abraham Lincoln
supercalifragilisticexpialidocious

We know nothing can replace the balance of art and science that is the qualitative cornerstone of research in the humanities. But we hope the Google Books Ngram Viewer will spark some new hypotheses ripe for in-depth investigation, and invite casual exploration at the same time. We’ve started working with some researchers already via our Digital Humanities Research Awards, and look forward to additional collaboration with like-minded researchers in the future.
Email ThisBlogThis!Share to XShare to Facebook
Posted in Google Books, Ngram, Research Awards | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • New research from Google shows that 88% of the traffic generated by mobile search ads is not replaced by traffic originating from mobile organic search
    Posted by Shaun Lysen, Statistician at Google Often times people are presented with two choices after making a search on their devices - the...
  • Education Awards on Google App Engine
    Posted by Andrea Held, Google University Relations Cross-posted with Google Developers Blog Last year we invited proposals for innovative p...
  • More researchers dive into the digital humanities
    Posted by Jon Orwant, Engineering Manager for Google Books When we started Google Book Search back in 2004, we were driven by the desire to...
  • Google, the World Wide Web and WWW conference: years of progress, prosperity and innovation
    Posted by Prabhakar Raghavan, Vice President of Engineering More than forty members of Google’s technical staff gathered in Lyon, France i...
  • Query Language Modeling for Voice Search
    Posted by Ciprian Chelba, Research Scientist About three years ago we set a goal to enable speaking to the Google Search engine on smart-pho...
  • Announcing our Q4 Research Awards
    Posted by Maggie Johnson, Director of Education & University Relations and Jeff Walz, Head of University Relations We do a significant a...
  • Word of Mouth: Introducing Voice Search for Indonesian, Malaysian and Latin American Spanish
    Posted by Linne Ha, International Program Manager Read more about the launch of Voice Search in Latin American Spanish on the Google América...
  • Under the Hood of App Inventor for Android
    Posted by Bill Magnuson, Hal Abelson, and Mark Friedman We recently announced our App Inventor for Android project on the Google Research B...
  • Make Your Websites More Accessible to More Users with Introduction to Web Accessibility
    Eve Andersson, Manager, Accessibility Engineering Cross-posted with  Google Developer's Blog You work hard to build clean, intuitive web...
  • 11 Billion Clues in 800 Million Documents: A Web Research Corpus Annotated with Freebase Concepts
    Posted by Dave Orr, Amar Subramanya, Evgeniy Gabrilovich, and Michael Ringgaard, Google Research “I assume that by knowing the truth you mea...

Categories

  • accessibility
  • ACL
  • ACM
  • Acoustic Modeling
  • ads
  • adsense
  • adwords
  • Africa
  • Android
  • API
  • App Engine
  • App Inventor
  • Audio
  • Awards
  • Cantonese
  • China
  • Computer Science
  • conference
  • conferences
  • correlate
  • crowd-sourcing
  • CVPR
  • datasets
  • Deep Learning
  • distributed systems
  • Earth Engine
  • economics
  • Education
  • Electronic Commerce and Algorithms
  • EMEA
  • EMNLP
  • entities
  • Exacycle
  • Faculty Institute
  • Faculty Summit
  • Fusion Tables
  • gamification
  • Google Books
  • Google+
  • Government
  • grants
  • HCI
  • Image Annotation
  • Information Retrieval
  • internationalization
  • Interspeech
  • jsm
  • jsm2011
  • K-12
  • Korean
  • Labs
  • localization
  • Machine Hearing
  • Machine Learning
  • Machine Translation
  • MapReduce
  • market algorithms
  • Market Research
  • ML
  • MOOC
  • NAACL
  • Natural Language Processing
  • Networks
  • Ngram
  • NIPS
  • NLP
  • open source
  • operating systems
  • osdi
  • osdi10
  • patents
  • ph.d. fellowship
  • PiLab
  • Policy
  • Public Data Explorer
  • publication
  • Publications
  • renewable energy
  • Research Awards
  • resource optimization
  • Search
  • search ads
  • Security and Privacy
  • SIGMOD
  • Site Reliability Engineering
  • Speech
  • statistics
  • Structured Data
  • Systems
  • Translate
  • trends
  • TV
  • UI
  • University Relations
  • UNIX
  • User Experience
  • video
  • Vision Research
  • Visiting Faculty
  • Visualization
  • Voice Search
  • Wiki
  • wikipedia
  • WWW
  • YouTube

Blog Archive

  • ►  2013 (51)
    • ►  December (3)
    • ►  November (9)
    • ►  October (2)
    • ►  September (5)
    • ►  August (2)
    • ►  July (6)
    • ►  June (7)
    • ►  May (5)
    • ►  April (3)
    • ►  March (4)
    • ►  February (4)
    • ►  January (1)
  • ►  2012 (59)
    • ►  December (4)
    • ►  October (4)
    • ►  September (3)
    • ►  August (9)
    • ►  July (9)
    • ►  June (7)
    • ►  May (7)
    • ►  April (2)
    • ►  March (7)
    • ►  February (3)
    • ►  January (4)
  • ►  2011 (51)
    • ►  December (5)
    • ►  November (2)
    • ►  September (3)
    • ►  August (4)
    • ►  July (9)
    • ►  June (6)
    • ►  May (4)
    • ►  April (4)
    • ►  March (5)
    • ►  February (5)
    • ►  January (4)
  • ▼  2010 (44)
    • ▼  December (7)
      • More researchers dive into the digital humanities
      • Robot hackathon connects with Android, browsers an...
      • Find out what’s in a word, or five, with the Googl...
      • Letting everyone do great things with App Inventor
      • $6 million to faculty in Q4 Research Awards
      • Four Googlers elected ACM Fellows this year
      • Google Launches Cantonese Voice Search in Hong Kong
    • ►  November (2)
    • ►  October (9)
    • ►  September (7)
    • ►  August (2)
    • ►  July (7)
    • ►  June (3)
    • ►  May (2)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (2)
  • ►  2009 (44)
    • ►  December (8)
    • ►  November (4)
    • ►  August (4)
    • ►  July (5)
    • ►  June (5)
    • ►  May (4)
    • ►  April (6)
    • ►  March (3)
    • ►  February (1)
    • ►  January (4)
  • ►  2008 (11)
    • ►  December (1)
    • ►  November (1)
    • ►  October (1)
    • ►  September (1)
    • ►  July (1)
    • ►  May (3)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
  • ►  2007 (9)
    • ►  October (1)
    • ►  September (2)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  February (2)
  • ►  2006 (15)
    • ►  December (1)
    • ►  November (1)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  April (3)
    • ►  March (4)
    • ►  February (1)
Powered by Blogger.

About Me

Unknown
View my complete profile