Compact System

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Tuesday, 15 December 2009

Teaching a Computer to Understand Japanese

Posted on 10:00 by Unknown
Posted by Mike Schuster, Google Research and Kaisuke Nakajima, Google Japan

On December 7th, we launched our new Japanese voice search system (音声検索), which has been available for various flavors of English since last year and for Mandarin Chinese for the past two months. The initial Japanese system works on the Android platform and also through the Google Mobile App on the iPhone as announced in a Japanese blog and a general explanation on how to get started. For developers who want to make use of the speech recognition backend for their own Android applications there is a public API (recognizer intent API) described here.

Although speech recognition has had a long history in Japan, creating a system that can handle a problem as difficult as voice search is still a considerable challenge. Today, most speech recognition systems are large statistical systems that must learn two models from sets of examples, an acoustic model and a language model. The acoustic model represents (statistically) the fundamental sounds of the language, and the language model statistically represents the words, phrases, and sentences of the language. The acoustic model for Japanese voice search was trained using a large amount of recorded Japanese speech, with the associated transcriptions of the words spoken. The language model for Japanese voice search was trained on Japanese search queries.

While speech recognition systems are surprisingly similar across different languages, there are some problems that are more specific to Japanese. Some of the challenges we faced while developing Japanese voice search included:

  • Spaces in Japanese text
    As we looked at some popular search queries in Japan we saw that Japanese often doesn't have spaces but sometimes it does. For example, if a user searches for Ramen noodles near Tokyo station they will often type: "東京駅 ラーメン" with a space in between Tokyo station and Ramen -- therefore, we would like to display it in this way as well. Getting the spaces right is difficult and we continue working to improve it.
  • Japanese word boundaries
    Word boundaries in Japanese are often not clear and subject to interpretation as most of the time there are not spaces between words. This also makes the definition of the vocabulary (the words that can be recognized theoretically) extremely large. We deal with this problem by finding likely word boundaries using a statistical system which also helps us limit the vocabulary.
  • Japanese text is written in 4 different writing systems
    Japanese text as written today uses Kanji, Hiragana, Katakana & Romaji, often mixed in the same sentence and sometimes in the same word, depending on the definition of a word. Try these example queries to see some interesting cases: "価格.com", "マーボー豆腐", "東京都渋谷区桜丘町26-1". We try to display the output in a way that is most user-friendly, which often means to display it as you would write it down.
  • Japanese has lots of basic characters and many have several pronunciations depending on context
    To be able to recognize a word you need to know its pronunciation. Western languages in general use only ASCII or a slightly extended set of characters which is relatively small (less than 100 total in most cases). For Japanese the number of basic characters is the union of all basic characters from the four writing systems mentioned above, which is several thousands in total. Finding the correct pronunciations for all words in the very large voice search vocabulary is difficult and is often done using a combination of human effort and automatic statistical systems. This is even more difficult in the Japanese case as the number of basic characters is higher and there are a vast number of exceptions, for example consider the case of: "一人" (hitori) versus "一人前" (ichininmae). Although the phrases look very similar they have completely different pronunciations.
  • Encoding issues
    Japanese characters can be written in many encoding systems including UTF-8, Shift_JIS, EUC-JP and others. While at Google we try to use exclusively UTF-8 there are still interesting edge cases to deal with. For example some characters exist in different forms in the same encoding system. Compare for example "カナ" and "カナ" -- they both say "kana" and mean the exact same thing, the first in full-width and the second in half-width. There are numerous similar cases like this in Japanese that make normalization of the text data more difficult.
  • Every speaker sounds different
    People speak in different styles, slow or fast, with an accent or without, have lower or higher pitched voices, etc. To make it work for all these different conditions we trained our system on data from many different sources to capture as many conditions as possible.
The challenges listed above are just a small portion of what we dealt with while building the Japanese voice search system. Over time, we are committed to improve the system as much as and as quickly as possible to make speech, in addition to the keyboard, a user-friendly input modality on mobile devices. We will push a first set of improved models this week.
Email ThisBlogThis!Share to XShare to Facebook
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • New research from Google shows that 88% of the traffic generated by mobile search ads is not replaced by traffic originating from mobile organic search
    Posted by Shaun Lysen, Statistician at Google Often times people are presented with two choices after making a search on their devices - the...
  • Education Awards on Google App Engine
    Posted by Andrea Held, Google University Relations Cross-posted with Google Developers Blog Last year we invited proposals for innovative p...
  • More researchers dive into the digital humanities
    Posted by Jon Orwant, Engineering Manager for Google Books When we started Google Book Search back in 2004, we were driven by the desire to...
  • Google, the World Wide Web and WWW conference: years of progress, prosperity and innovation
    Posted by Prabhakar Raghavan, Vice President of Engineering More than forty members of Google’s technical staff gathered in Lyon, France i...
  • Query Language Modeling for Voice Search
    Posted by Ciprian Chelba, Research Scientist About three years ago we set a goal to enable speaking to the Google Search engine on smart-pho...
  • Announcing our Q4 Research Awards
    Posted by Maggie Johnson, Director of Education & University Relations and Jeff Walz, Head of University Relations We do a significant a...
  • Word of Mouth: Introducing Voice Search for Indonesian, Malaysian and Latin American Spanish
    Posted by Linne Ha, International Program Manager Read more about the launch of Voice Search in Latin American Spanish on the Google América...
  • Under the Hood of App Inventor for Android
    Posted by Bill Magnuson, Hal Abelson, and Mark Friedman We recently announced our App Inventor for Android project on the Google Research B...
  • Make Your Websites More Accessible to More Users with Introduction to Web Accessibility
    Eve Andersson, Manager, Accessibility Engineering Cross-posted with  Google Developer's Blog You work hard to build clean, intuitive web...
  • 11 Billion Clues in 800 Million Documents: A Web Research Corpus Annotated with Freebase Concepts
    Posted by Dave Orr, Amar Subramanya, Evgeniy Gabrilovich, and Michael Ringgaard, Google Research “I assume that by knowing the truth you mea...

Categories

  • accessibility
  • ACL
  • ACM
  • Acoustic Modeling
  • ads
  • adsense
  • adwords
  • Africa
  • Android
  • API
  • App Engine
  • App Inventor
  • Audio
  • Awards
  • Cantonese
  • China
  • Computer Science
  • conference
  • conferences
  • correlate
  • crowd-sourcing
  • CVPR
  • datasets
  • Deep Learning
  • distributed systems
  • Earth Engine
  • economics
  • Education
  • Electronic Commerce and Algorithms
  • EMEA
  • EMNLP
  • entities
  • Exacycle
  • Faculty Institute
  • Faculty Summit
  • Fusion Tables
  • gamification
  • Google Books
  • Google+
  • Government
  • grants
  • HCI
  • Image Annotation
  • Information Retrieval
  • internationalization
  • Interspeech
  • jsm
  • jsm2011
  • K-12
  • Korean
  • Labs
  • localization
  • Machine Hearing
  • Machine Learning
  • Machine Translation
  • MapReduce
  • market algorithms
  • Market Research
  • ML
  • MOOC
  • NAACL
  • Natural Language Processing
  • Networks
  • Ngram
  • NIPS
  • NLP
  • open source
  • operating systems
  • osdi
  • osdi10
  • patents
  • ph.d. fellowship
  • PiLab
  • Policy
  • Public Data Explorer
  • publication
  • Publications
  • renewable energy
  • Research Awards
  • resource optimization
  • Search
  • search ads
  • Security and Privacy
  • SIGMOD
  • Site Reliability Engineering
  • Speech
  • statistics
  • Structured Data
  • Systems
  • Translate
  • trends
  • TV
  • UI
  • University Relations
  • UNIX
  • User Experience
  • video
  • Vision Research
  • Visiting Faculty
  • Visualization
  • Voice Search
  • Wiki
  • wikipedia
  • WWW
  • YouTube

Blog Archive

  • ►  2013 (51)
    • ►  December (3)
    • ►  November (9)
    • ►  October (2)
    • ►  September (5)
    • ►  August (2)
    • ►  July (6)
    • ►  June (7)
    • ►  May (5)
    • ►  April (3)
    • ►  March (4)
    • ►  February (4)
    • ►  January (1)
  • ►  2012 (59)
    • ►  December (4)
    • ►  October (4)
    • ►  September (3)
    • ►  August (9)
    • ►  July (9)
    • ►  June (7)
    • ►  May (7)
    • ►  April (2)
    • ►  March (7)
    • ►  February (3)
    • ►  January (4)
  • ►  2011 (51)
    • ►  December (5)
    • ►  November (2)
    • ►  September (3)
    • ►  August (4)
    • ►  July (9)
    • ►  June (6)
    • ►  May (4)
    • ►  April (4)
    • ►  March (5)
    • ►  February (5)
    • ►  January (4)
  • ►  2010 (44)
    • ►  December (7)
    • ►  November (2)
    • ►  October (9)
    • ►  September (7)
    • ►  August (2)
    • ►  July (7)
    • ►  June (3)
    • ►  May (2)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (2)
  • ▼  2009 (44)
    • ▼  December (8)
      • Announcing our Q4 Research Awards
      • Teaching a Computer to Understand Japanese
      • Research Areas of Interest - Multimedia
      • Machine Learning with Quantum Algorithms
      • Celebrating Computer Science Education Week
      • Join us for the 2010 Google GRAD CS Forum!
      • Automatic Captioning in YouTube
      • Four Googlers elected ACM Fellows
    • ►  November (4)
    • ►  August (4)
    • ►  July (5)
    • ►  June (5)
    • ►  May (4)
    • ►  April (6)
    • ►  March (3)
    • ►  February (1)
    • ►  January (4)
  • ►  2008 (11)
    • ►  December (1)
    • ►  November (1)
    • ►  October (1)
    • ►  September (1)
    • ►  July (1)
    • ►  May (3)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
  • ►  2007 (9)
    • ►  October (1)
    • ►  September (2)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  February (2)
  • ►  2006 (15)
    • ►  December (1)
    • ►  November (1)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  April (3)
    • ►  March (4)
    • ►  February (1)
Powered by Blogger.

About Me

Unknown
View my complete profile