Compact System

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Monday, 17 August 2009

On the predictability of Search Trends

Posted on 14:19 by Unknown
Posted by Yossi Matias, Niv Efron, and Yair Shimshoni, Google Labs, Israel.

Since launching Google Trends and Google Insights for Search, we've been providing daily insight into what the world is searching for. An understanding of search trends can be useful for advertisers, marketers, economists, scholars, and anyone else interested in knowing more about their world and what's currently top-of-mind.

As many have observed, the trends of some search queries are quite seasonal and have repeated patterns. See, for instance, the search trends for the query "ski" hit their peak during the winter seasons in the US and Australia. The search trends for basketball correlate with annual league events, and are consistent year-over-year. When looking at trends of the aggregated volume of search queries related to particular categories, one can also observe regular patterns in some categories like Food & Drink or Automotive. Such trends sequences appear quite predictable, and one would naturally expect the patterns of previous years to repeat looking forward.

On the other hand, for many other search queries and categories, the trends are quite irregular and hard to predict. Examples include the search trends for obama, twitter, android, or global warming, and the trend of aggregate searches in the News & Current Events category.

Having predictable trends for a search query or for a group of queries could have interesting ramifications. One could forecast the trends into the future, and use it as a "best guess" for various business decisions such as budget planning, marketing campaigns and resource allocations. One could identify deviation from such forecasting and identify new factors that are influencing the search volume as demonstrated in Flu Trends.

We were therefore interested in the following questions:
  • How many search queries have trends that are predictable?
  • Are some categories more predictable than others? How is the distribution of predictable trends between the various categories?
  • How predictable are the trends of aggregated search queries for different categories? Which categories are more predictable and which are less so?
To learn about the predictability of search trends, and so as to overcome our basic limitation of not knowing what the future will entail, we characterize the predictability of a Trends series based on its historical performance. In other words, we estimate the a posteriori predictability of a sequence determined by the error of forecasted trends vs the actual performance.

Specifically, we have used a simple forecasting model that learns basic seasonality and general trend. For each trends sequence of interest, we take a point in time, t, which is about a year back, compute a one year forecasting for t based on historical data available at time t, and compare it to the actual trends sequence that occurs since time t. The error between the forecasting trends and the actual trends characterizes the predictability level of a sequence, and when the error is smaller than a pre-defined threshold, we denote the trends query as predictable.

Our work to date is summarized in a paper called On the Predictability of Search Trends which includes the following observations:
  • Over half of the most popular Google search queries are predictable in a 12 month ahead forecast, with a mean absolute prediction error of about 12%.
  • Nearly half of the most popular queries are not predictable (with respect to the model we have used).
  • Some categories have particularly high fraction of predictable queries; for instance, Health (74%), Food & Drink (67%) and Travel (65%).
  • Some categories have particularly low fraction of predictable queries; for instance, Entertainment (35%) and Social Networks & Online Communities (27%).
  • The trends of aggregated queries per categories are much more predictable: 88% of the aggregated category search trends of over 600 categories in Insights for Search are predictable, with a mean absolute prediction error of of less than 6%.
  • There is a clear association between the existence of seasonality patterns and higher predictability, as well as an association between high levels of outliers and lower predictability. For the Entertainment category that has typically less seasonal search behavior as well as relatively higher number of singular spikes of interest, we have seen a predictability of 35%, where as the category of Travel with a very seasonal behavior and lower tendency for short spikes of interest had a predictability of 65%.
  • One should expect the actual search trends to deviate from forecast for many predictable queries, due to possible events and dynamic circumstances.
  • We show the forecasting vs actual for trends of a few categories, including some that were used recently for predicting the present of various economic indicators. This demonstrates how forecasting can serve as a good baseline for identifying interesting deviations in actual search traffic.
As we see that many of the search trends are predictable, we are introducing today a new forecasting feature in Insights for Search, along with a new version of the product. The forecasting feature is applied to queries which are identified as predictable (see, for instance, basketball or the trends in the Automotive category) and then shown as an extrapolation of the historical trends and search patterns.

There are many more questions that can be looked at regarding search trends in general, and their predictability in particular, including design and testing more advanced forecasting models, getting other insights into the distributions of sequences, and demonstrating interesting deviations of actual-vs-forecast for predictable trends series. We'd love to hear from you - share with us your findings, published results or insights - email us at insightsforsearch@google.com.
Email ThisBlogThis!Share to XShare to Facebook
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • CDC Birth Vital Statistics in BigQuery
    Posted by Dan Vanderkam, Software Engineer Google’s BigQuery Service lets enterprises and developers crunch large-scale data sets quickly...
  • Towards Energy-Proportional Datacenters
    Posted by Dennis Abts, Michael R. Marty, Philip M. Wells, Peter Klausler, and Hong Liu This is part of the series highlighting some notable...
  • Site Reliability Engineers: “solving the most interesting problems”
    Posted by Chris Reid, Sydney Staffing team I recently sat down with Ben Appleton, a Senior Staff Software Engineer, to talk about his recent...
  • Our Faculty Institute brings faculty back to the drawing board
    Posted by Nina Kim Schultz, Google Education Research Cross-posted with the Official Google Blog School may still be out for summer, but tea...
  • Market Algorithms and Optimization Meeting
    Posted by  Vahab S. Mirrokni and Muthu Muthukrishnan Google auctions ads, and enables a market with millions of advertisers and users.  This...
  • Our Unique Approach to Research
    Posted by  Alfred Spector , Vice President of Research and Special Initiatives Google started as a research project —and research has remain...
  • Impact of Organic Ranking on Ad Click Incrementality
    Posted by David Chan, Statistician and Lizzy Van Alstine, Research Evangelist  In 2011, Google released a Search Ads Pause research study w...
  • Large-scale graph computing at Google
    Posted by Grzegorz Czajkowski, Systems Infrastructure Team If you squint the right way, you will notice that graphs are everywhere. For exam...
  • Continuing the quest for future computer scientists with CS4HS
    Erin Mindell, Program Manager, Google Education Computer Science for High School (CS4HS) began five years ago with a simple question: How c...
  • Millions of Core-Hours Awarded to Science
    Posted by Andrea Held, Program Manager, University Relations In 2011 Google University Relations launched a new academic research awards pr...

Categories

  • accessibility
  • ACL
  • ACM
  • Acoustic Modeling
  • ads
  • adsense
  • adwords
  • Africa
  • Android
  • API
  • App Engine
  • App Inventor
  • Audio
  • Awards
  • Cantonese
  • China
  • Computer Science
  • conference
  • conferences
  • correlate
  • crowd-sourcing
  • CVPR
  • datasets
  • Deep Learning
  • distributed systems
  • Earth Engine
  • economics
  • Education
  • Electronic Commerce and Algorithms
  • EMEA
  • EMNLP
  • entities
  • Exacycle
  • Faculty Institute
  • Faculty Summit
  • Fusion Tables
  • gamification
  • Google Books
  • Google+
  • Government
  • grants
  • HCI
  • Image Annotation
  • Information Retrieval
  • internationalization
  • Interspeech
  • jsm
  • jsm2011
  • K-12
  • Korean
  • Labs
  • localization
  • Machine Hearing
  • Machine Learning
  • Machine Translation
  • MapReduce
  • market algorithms
  • Market Research
  • ML
  • MOOC
  • NAACL
  • Natural Language Processing
  • Networks
  • Ngram
  • NIPS
  • NLP
  • open source
  • operating systems
  • osdi
  • osdi10
  • patents
  • ph.d. fellowship
  • PiLab
  • Policy
  • Public Data Explorer
  • publication
  • Publications
  • renewable energy
  • Research Awards
  • resource optimization
  • Search
  • search ads
  • Security and Privacy
  • SIGMOD
  • Site Reliability Engineering
  • Speech
  • statistics
  • Structured Data
  • Systems
  • Translate
  • trends
  • TV
  • UI
  • University Relations
  • UNIX
  • User Experience
  • video
  • Vision Research
  • Visiting Faculty
  • Visualization
  • Voice Search
  • Wiki
  • wikipedia
  • WWW
  • YouTube

Blog Archive

  • ►  2013 (51)
    • ►  December (3)
    • ►  November (9)
    • ►  October (2)
    • ►  September (5)
    • ►  August (2)
    • ►  July (6)
    • ►  June (7)
    • ►  May (5)
    • ►  April (3)
    • ►  March (4)
    • ►  February (4)
    • ►  January (1)
  • ►  2012 (59)
    • ►  December (4)
    • ►  October (4)
    • ►  September (3)
    • ►  August (9)
    • ►  July (9)
    • ►  June (7)
    • ►  May (7)
    • ►  April (2)
    • ►  March (7)
    • ►  February (3)
    • ►  January (4)
  • ►  2011 (51)
    • ►  December (5)
    • ►  November (2)
    • ►  September (3)
    • ►  August (4)
    • ►  July (9)
    • ►  June (6)
    • ►  May (4)
    • ►  April (4)
    • ►  March (5)
    • ►  February (5)
    • ►  January (4)
  • ►  2010 (44)
    • ►  December (7)
    • ►  November (2)
    • ►  October (9)
    • ►  September (7)
    • ►  August (2)
    • ►  July (7)
    • ►  June (3)
    • ►  May (2)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (2)
  • ▼  2009 (44)
    • ►  December (8)
    • ►  November (4)
    • ▼  August (4)
      • 51 Languages in Google Translate
      • On the predictability of Search Trends
      • Under the Hood of App Inventor for Android
      • Two Views from the 2009 Google Faculty Summit
    • ►  July (5)
    • ►  June (5)
    • ►  May (4)
    • ►  April (6)
    • ►  March (3)
    • ►  February (1)
    • ►  January (4)
  • ►  2008 (11)
    • ►  December (1)
    • ►  November (1)
    • ►  October (1)
    • ►  September (1)
    • ►  July (1)
    • ►  May (3)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
  • ►  2007 (9)
    • ►  October (1)
    • ►  September (2)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  February (2)
  • ►  2006 (15)
    • ►  December (1)
    • ►  November (1)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  April (3)
    • ►  March (4)
    • ►  February (1)
Powered by Blogger.

About Me

Unknown
View my complete profile