Compact System

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Thursday, 10 March 2011

Large Scale Image Annotation: Learning to Rank with Joint Word-Image Embeddings

Posted on 08:00 by Unknown
Posted by Jason Weston and Samy Bengio, Research Team

In our paper, we introduce a generic framework to find a joint representation of images and their labels, which can then be used for various tasks, including image ranking and image annotation.

We focus on the task of automatic assignment of annotations (text labels) to images given only the pixel representation of the image (i.e., with no known metadata). This is achieved by a learning algorithm, that is, where the computer learns to predict annotations for new images given annotated training images. Such training datasets are becoming larger and larger, with tens of millions of images and tens of thousands of possible annotations. In this paper, we propose a strongly performing method that scales to such datasets by simultaneously learning to optimize precision at the top of the ranked list of annotations for a given image and learning a low-dimensional joint embedding vector space for both images and annotations. Our system learns an interpretable model, where annotations with alternate wordings ("president obama" or "barack"), different languages ("tour eiffel" or "eiffel tower"), or similar concepts (such as "toad" or "frog") are close in the embedding space. Hence, even when our model does not predict the exact annotation given by a human labeler, it often predicts similar annotations.

Our system is trained on ~10 million images with ~100,000 possible annotation types and it annotates a single new image in ~0.17 seconds (not including feature processing) and consumes only 82MB of memory. Our method both outperforms all the methods we tested against and in comparison to them is faster and consumes less memory, making it possible to house such a system on a laptop or mobile device.
Email ThisBlogThis!Share to XShare to Facebook
Posted in Image Annotation, Vision Research | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Towards Energy-Proportional Datacenters
    Posted by Dennis Abts, Michael R. Marty, Philip M. Wells, Peter Klausler, and Hong Liu This is part of the series highlighting some notable...
  • CDC Birth Vital Statistics in BigQuery
    Posted by Dan Vanderkam, Software Engineer Google’s BigQuery Service lets enterprises and developers crunch large-scale data sets quickly...
  • Market Algorithms and Optimization Meeting
    Posted by  Vahab S. Mirrokni and Muthu Muthukrishnan Google auctions ads, and enables a market with millions of advertisers and users.  This...
  • International Conference on Machine Learning (ICML 2009) in Montreal
    Posted by Eyal Even Dar and Vahab Mirrokni , Google Research, NY The 26th International Conference on Machine Learning ( ICML 2009 ) was re...
  • Site Reliability Engineers: “solving the most interesting problems”
    Posted by Chris Reid, Sydney Staffing team I recently sat down with Ben Appleton, a Senior Staff Software Engineer, to talk about his recent...
  • Two Views from the 2009 Google Faculty Summit
    Posted by Alfred Spector, Vice President of Research and Special Initiatives [cross-posted with the Official Google Blog ] We held our fifth...
  • Focusing on Our Users: The Google Health Redesign
    Posted by Hendrik Mueller, User Experience Researcher When I relocated to New York City a few years ago, some of the most important health i...
  • Supporting computer science education with CS4HS
    Posted by Terry Ednacot, Education Program Manager Recent statistics have shown a decline in the number of U.S. students taking computer sc...
  • Large-scale graph computing at Google
    Posted by Grzegorz Czajkowski, Systems Infrastructure Team If you squint the right way, you will notice that graphs are everywhere. For exam...
  • Our Faculty Institute brings faculty back to the drawing board
    Posted by Nina Kim Schultz, Google Education Research Cross-posted with the Official Google Blog School may still be out for summer, but tea...

Categories

  • accessibility
  • ACL
  • ACM
  • Acoustic Modeling
  • ads
  • adsense
  • adwords
  • Africa
  • Android
  • API
  • App Engine
  • App Inventor
  • Audio
  • Awards
  • Cantonese
  • China
  • Computer Science
  • conference
  • conferences
  • correlate
  • crowd-sourcing
  • CVPR
  • datasets
  • Deep Learning
  • distributed systems
  • Earth Engine
  • economics
  • Education
  • Electronic Commerce and Algorithms
  • EMEA
  • EMNLP
  • entities
  • Exacycle
  • Faculty Institute
  • Faculty Summit
  • Fusion Tables
  • gamification
  • Google Books
  • Google+
  • Government
  • grants
  • HCI
  • Image Annotation
  • Information Retrieval
  • internationalization
  • Interspeech
  • jsm
  • jsm2011
  • K-12
  • Korean
  • Labs
  • localization
  • Machine Hearing
  • Machine Learning
  • Machine Translation
  • MapReduce
  • market algorithms
  • Market Research
  • ML
  • MOOC
  • NAACL
  • Natural Language Processing
  • Networks
  • Ngram
  • NIPS
  • NLP
  • open source
  • operating systems
  • osdi
  • osdi10
  • patents
  • ph.d. fellowship
  • PiLab
  • Policy
  • Public Data Explorer
  • publication
  • Publications
  • renewable energy
  • Research Awards
  • resource optimization
  • Search
  • search ads
  • Security and Privacy
  • SIGMOD
  • Site Reliability Engineering
  • Speech
  • statistics
  • Structured Data
  • Systems
  • Translate
  • trends
  • TV
  • UI
  • University Relations
  • UNIX
  • User Experience
  • video
  • Vision Research
  • Visiting Faculty
  • Visualization
  • Voice Search
  • Wiki
  • wikipedia
  • WWW
  • YouTube

Blog Archive

  • ►  2013 (51)
    • ►  December (3)
    • ►  November (9)
    • ►  October (2)
    • ►  September (5)
    • ►  August (2)
    • ►  July (6)
    • ►  June (7)
    • ►  May (5)
    • ►  April (3)
    • ►  March (4)
    • ►  February (4)
    • ►  January (1)
  • ►  2012 (59)
    • ►  December (4)
    • ►  October (4)
    • ►  September (3)
    • ►  August (9)
    • ►  July (9)
    • ►  June (7)
    • ►  May (7)
    • ►  April (2)
    • ►  March (7)
    • ►  February (3)
    • ►  January (4)
  • ▼  2011 (51)
    • ►  December (5)
    • ►  November (2)
    • ►  September (3)
    • ►  August (4)
    • ►  July (9)
    • ►  June (6)
    • ►  May (4)
    • ►  April (4)
    • ▼  March (5)
      • Word of Mouth: Introducing Voice Search for Indone...
      • Reading tea leaves in the tourism industry: A Case...
      • Games, auctions and beyond
      • Large Scale Image Annotation: Learning to Rank wit...
      • Building resources to syntactically parse the web
    • ►  February (5)
    • ►  January (4)
  • ►  2010 (44)
    • ►  December (7)
    • ►  November (2)
    • ►  October (9)
    • ►  September (7)
    • ►  August (2)
    • ►  July (7)
    • ►  June (3)
    • ►  May (2)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (2)
  • ►  2009 (44)
    • ►  December (8)
    • ►  November (4)
    • ►  August (4)
    • ►  July (5)
    • ►  June (5)
    • ►  May (4)
    • ►  April (6)
    • ►  March (3)
    • ►  February (1)
    • ►  January (4)
  • ►  2008 (11)
    • ►  December (1)
    • ►  November (1)
    • ►  October (1)
    • ►  September (1)
    • ►  July (1)
    • ►  May (3)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
  • ►  2007 (9)
    • ►  October (1)
    • ►  September (2)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  February (2)
  • ►  2006 (15)
    • ►  December (1)
    • ►  November (1)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  April (3)
    • ►  March (4)
    • ►  February (1)
Powered by Blogger.

About Me

Unknown
View my complete profile