2013 ~ Compact System

Monday, 16 December 2013

Groundbreaking simulations by Google Exacycle Visiting Faculty

Posted on 10:00 by Unknown

Posted by David Konerding, Staff Software Engineer

In April 2011, we announced the Google Exacycle for Visiting Faculty, a new academic research awards program donating one billion core-hours of computational capacity to researchers. The Exacycle project enables massive parallelism for doing science in the cloud, and inspired multiple proposals aiming to take advantage of cloud scale. Today, we would like to share some exciting results from a project built on Google’s infrastructure.

Google Research Scientist Kai Kohlhoff, in collaboration with Stanford University and Google engineers, investigated how an important signalling protein in the membrane of human cells can switch off and on by changing its three-dimensional structure following a sequence of local conformational changes. This research can help to better understand the effects of certain chemical compounds on the human body and assist future development of more potent drug molecules with fewer side effects.

The protein, known as the beta-2 adrenergic receptor, is a G protein-coupled receptor (GPCR), a primary drug target that plays a role in several debilitating health conditions. These include asthma, type-2 diabetes, obesity, and hypertension. The receptor and its close GPCR relatives bind to many familiar molecules, such as epinephrine, beta-blockers, and caffeine. Understanding their structure, function, and the underlying dynamics during binding and activation increases our chances to decode the causes and mechanisms of diseases.

To gain insights into the receptor’s dynamics, Kai performed detailed molecular simulations using hundreds of millions of core hours on Google’s infrastructure, generating hundreds of terabytes of valuable molecular dynamics data. The Exacycle program enabled the realization of simulations with longer sampling and higher accuracy than previous experiments, exposing the complex processes taking place on the nanoscale during activation of this biological switch.

The paper summarizing the results of Kai’s and his collaborators’ work is featured on the January cover of Nature Chemistry, with artwork by Google R&D UX Creative Lead Thor Lewis, to be published on December 17, 2013. The online version of his paper was published on their website today.

We are extremely pleased with the results of this program. We look forward to seeing this research continue to develop.

Posted in Exacycle | No comments

Wednesday, 11 December 2013

Googler Moti Yung elected as 2013 ACM Fellow

Posted on 10:00 by Unknown

Posted by Alfred Spector, VP of Engineering

Yesterday, the Association for Computing Machinery (ACM) released the list of those who have been elected ACM Fellows in 2013. I am excited to announce that Google Research Scientist Moti Yung is among the distinguished individuals receiving this honor.

Moti was chosen for his contributions to computer science and cryptography that have provided fundamental knowledge to the field of computing security. We are proud of the breadth and depth of his contributions, and believe they serve as motivation for computer scientists worldwide.

On behalf of Google, I congratulate our colleague, who joins the 17 ACM Fellow and other professional society awardees at Google, in exemplifying our extraordinarily talented people. You can read a more detailed summary of Moti’s accomplishments below, including the official citations from ACM.

Dr. Moti Yung: Research Scientist
For contributions to cryptography and its use in security and privacy of systems

Moti has made key contributions to several areas of cryptography including (but not limited to!) secure group communication, digital signatures, traitor tracing, threshold cryptosystems and zero knowledge proofs. Moti's work often seeds a new area in theoretical cryptography as well as finding applications broadly. For example, in 1992, Moti co-developed a protocol by which users can commonly compute a group key using their own private information that is secure against coalitions of rogue users. This work led to the growth of the broadcast encryption research area and has applications to pay-tv, network communication and sensor networks.
Moti is also a long-time leader of the security and privacy research communities, having mentored many of the leading researchers in the field, and serving on numerous program committees. A prolific author, Moti routinely publishes 10+ papers a year, and has been a key contributor to principled and consistent anonymization practices and data protection at Google.

Posted in ACM | No comments

Tuesday, 3 December 2013

Free Language Lessons for Computers

Posted on 10:10 by Unknown

Posted by Dave Orr, Google Research Product Manager

Not everything that can be counted counts.

Not everything that counts can be counted.

- William Bruce Cameron

50,000 relations from Wikipedia. 100,000 feature vectors from YouTube videos. 1.8 million historical infoboxes. 40 million entities derived from webpages. 11 billion Freebase entities in 800 million web documents. 350 billion words’ worth from books analyzed for syntax.

These are all datasets that we’ve shared with researchers around the world over the last year from Google Research.

But data by itself doesn’t mean much. Data is only valuable in the right context, and only if it leads to increased knowledge. Labeled data is critical to train and evaluate machine-learned systems in many arenas, improving systems that can increase our ability to understand the world. Advances in natural language understanding, information retrieval, information extraction, computer vision, etc. can help us tell stories, mine for valuable insights, or visualize information in beautiful and compelling ways.

That’s why we are pleased to be able to release sets of labeled data from various domains and with various annotations, some automatic and some manual. Our hope is that the research community will use these datasets in ways both straightforward and surprising, to improve systems for annotation or understanding, and perhaps launch new efforts we haven’t thought of.

Here’s a listing of the major datasets we’ve released in the last year, or you can subscribe to our mailing list. Please tell us what you’ve managed to accomplish, or send us pointers to papers that use this data. We want to see what the research world can do with what we’ve created.

50,000 Lessons on How to Read: a Relation Extraction Corpus

What is it: A human-judged dataset of two relations involving public figures on Wikipedia: about 10,000 examples of “place of birth” and 40,000 examples of “attended or graduated from an institution.”
Where can I find it: https://code.google.com/p/relation-extraction-corpus/
I want to know more: Here’s a handy blog post with a broader explanation, descriptions and examples of the data, and plenty of links to learn more.

11 Billion Clues in 800 Million Documents

What is it: We took the ClueWeb corpora and automatically labeled concepts and entities with Freebase concept IDs, an example of entity resolution. This dataset is huge: nearly 800 million web pages.
Where can I find it: We released two corpora: ClueWeb09 FACC and ClueWeb12 FACC.
I want to know more: We described the process and results in a recent blog post.

Features Extracted From YouTube Videos for Multiview Learning

What is it: Multiple feature families from a set of public YouTube videos of games. The videos are labeled with one of 30 categories, and each has an associated set of visual, auditory, and and textual features.
Where can I find it: The data and more information can be obtained from the UCI machine learning repository (multiview video dataset), or from Google’s repository.
I want to know more: Read more about the data and uses for it here.

40 Million Entities in Context

What is it: A disambiguation set consisting of pointers to 10 million web pages with 40 million entities that have links to Wikipedia. This is another entity resolution corpus, since the links can be used to disambiguate the mentions, but unlike the ClueWeb example above, the links are inserted by the web page authors and can therefore be considered human annotation.
Where can I find it: Here’s the WikiLinks corpus, and tools can be found to help use this data on our partner’s page: Umass Wiki-links.
I want to know more: Other disambiguation sets, data formats, ideas for uses of this data, and more can be found at our blog post announcing the release.

Distributing the Edit History of Wikipedia Infoboxes

What is it: The edit history of 1.8 million infoboxes in Wikipedia pages in one handy resource. Attributes on Wikipedia change over time, and some of them change more than others. Understanding attribute change is important for extracting accurate and useful information from Wikipedia.
Where can I find it: Download from Google or from Wikimedia Deutschland.
I want to know more: We posted a detailed look at the data, the process for gathering it, and where to find it. You can also read a paper we published on the release.

Note the change in the capital of Palau.

Syntactic Ngrams over Time

What is it: We automatically syntactically analyzed 350 billion words from the 3.5 million English language books in Google Books, and collated and released a set of fragments -- billions of unique tree fragments with counts sorted into types. The underlying corpus is the same one that underlies the recently updated Google Ngram Viewer.
Where can I find it: http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html
I want to know more: We discussed the nature of dependency parses and describe the data and release in a blog post. We also published a paper about the release.

Dictionaries for linking Text, Entities, and Ideas

What is it: We created a large database of pairs of 175 million strings associated with 7.5 million concepts, annotated with counts, which were mined from Wikipedia. The concepts in this case are Wikipedia articles, and the strings are anchor text spans that link to the concepts in question.
Where can I find it: http://nlp.stanford.edu/pubs/crosswikis-data.tar.bz2
I want to know more: A description of the data, several examples, and ideas for uses for it can be found in a blog post or in the associated paper.

Other datasets

Not every release had its own blog post describing it. Here are some other releases:

Automatic Freebase annotations of Trec’s Million Query and Web track queries.
A set of Freebase triples that have been deleted from Freebase over time -- 63 million of them.

Posted in Natural Language Processing | No comments

Tuesday, 26 November 2013

Released Data Set: Features Extracted From YouTube Videos for Multiview Learning

Posted on 09:00 by Unknown

Posted by Omid Madani, Senior Software Engineer

“If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.”

The “duck test”.

Performance of machine learning algorithms, supervised or unsupervised, is often significantly enhanced when a variety of feature families, or multiple views of the data, are available. For example, in the case of web pages, one feature family can be based on the words appearing on the page, and another can be based on the URLs and related connectivity properties. Similarly, videos contain both audio and visual signals where in turn each modality is analyzed in a variety of ways. For instance, the visual stream can be analyzed based on the color and edge distribution, texture, motion, object types, and so on. YouTube videos are also associated with textual information (title, tags, comments, etc.). Each feature family complements others in providing predictive signals to accomplish a prediction or classification task, for example, in automatically classifying videos into subject areas such as sports, music, comedy, games, and so on.

We have released a dataset of over 100k feature vectors extracted from public YouTube videos. These videos are labeled by one of 30 classes, each class corresponding to a video game (with some amount of class noise): each video shows a gameplay of a video game, for teaching purposes for example. Each instance (video) is described by three feature families (textual, visual, and auditory), and each family is broken into subfamilies yielding up to 13 feature types per instance. Neither video identities nor class identities are released.

We hope that this dataset will be valuable for research on a variety of multiview related machine learning topics, including multiview clustering, co-training, active learning, classifier fusion and ensembles.

The data and more information can be obtained from the UCI machine learning repository (multiview video dataset), or from here.

Posted in Machine Learning, YouTube | No comments

Monday, 25 November 2013

The MiniZinc Challenge

Posted on 09:00 by Unknown

Posted by Jon Orwant, Engineering Manager

Constraint Programming is a style of problem solving where the properties of a solution are first identified, and a large space of solutions is searched through to find the best. Good constraint programming depends on modeling the problem well, and on searching effectively. Poor representations or slow search techniques can make the difference between finding a good solution and finding no solution at all.

One example of constraint programming is scheduling: for instance, determining a schedule for a conference where there are 30 talks (that’s one constraint), only eight rooms to hold them in (that’s another constraint), and some talks can’t overlap (more constraints).

Every year, some of the world’s top constraint programming researchers compete for medals in the MiniZinc challenge. Problems range from scheduling to vehicle routing to program verification and frequency allocation.

Google’s open source solver, or-tools, took two gold medals and two silver medals. The gold medals were in parallel and portfolio search, and the silver medals were in fixed and free search. Google’s success was due in part to integrating a SAT solver to handle boolean constraints, and a new presolve phase inherited from integer programming.

Laurent Perron, a member of Google’s Optimization team and a lead contributor to or-tools, noted that every year brings fresh techniques to the competition: “One of the big surprises this year was the success of lazy-clause generation, which combines techniques from the SAT and constraint programming communities.”

If you’re interested in learning more about constraint programming, you can start at the wikipedia page, or have a look at or-tools.

The full list of winners is available here.

Posted in | No comments

Friday, 22 November 2013

New Research Challenges in Language Understanding

Posted on 09:00 by Unknown

Posted by Maggie Johnson, Director of Education and University Relations

We held the first global Language Understanding and Knowledge Discovery Focused Faculty Workshop in Nanjing, China, on November 14-15, 2013. Thirty-four faculty members joined the workshop arriving from 10 countries and regions across APAC, EMEA and the US. Googlers from Research, Engineering and University Relations/University Programs also attended the event.

The 2-day workshop included keynote talks, panel discussions and break-out sessions [agenda]. It was an engaging and productive workshop, and we saw lots of positive interactions among the attendees. The workshop encouraged communication between Google and faculty around the world working in these areas.

Research in text mining continues to explore open questions relating to entity annotation, relation extraction, and more. The workshop’s goal was to brainstorm and discuss relevant topics to further investigate these areas. Ultimately, this research should help provide users search results that are much more relevant to them.

At the end of the workshop, participants identified four topics representing challenges and opportunities for further exploration in Language Understanding and Knowledge Discovery:

Knowledge representation, integration, and maintenance
Efficient and scalable infrastructure and algorithms for inferencing
Presentation and explanation of knowledge
Multilingual computation

Going forward, Google will be collaborating with academic researchers on a position paper related to these topics. We also welcome faculty interested in contributing to further research in this area to submit a proposal to the Faculty Research Awards program. Faculty Research Awards are one-year grants to researchers working in areas of mutual interest.

The faculty attendees responded positively to the focused workshop format, as it allowed time to go in depth into important and timely research questions. Encouraged by their feedback, we are considering similar workshops on other topics in the future.

Posted in Faculty Summit, Natural Language Processing, University Relations | No comments

Tuesday, 19 November 2013

Unique Strategies for Scaling Teacher Professional Development

Posted on 09:00 by Unknown

Posted by Candice Reimers, Senior Program Manager

Research shows that professional development for educators has a direct, positive impact on students, so it’s no wonder that institutions are eager to explore creative ways to enhance professional development for K-12 teachers. Open source MOOC platforms, such as Course Builder, offer the flexibility to extend the reach of standard curriculum; recently, several courses have launched that demonstrate new and creative applications of MOOCs. With their wide reach, participant engagement, and rich content, MOOCs that offer professional development opportunities for teachers bring flexibility and accessibility to an important area.

This summer, the ScratchEd team out of Harvard University launched the Creative Computing MOOC, a 6 week self paced workshop focused on building computational thinking skills in the classroom. As a MOOC, the course had 2600 participants, who created more than 4700 Scratch projects, and engaged in 3500 forum discussions, compared to the “in-person” class held last year, which reached only 50 educators.

Other creative uses of Course Builder for educator professional development come from National Geographic and Annenberg Learner who joined forces to develop Water: The Essential Resource, a course developed around California’s Education and Environment Initiative. The Friday Institute’s MOOC, Digital Learning Transitions, focused on the benefits of utilizing educational technology and reached educators across 50 states and 68 countries worldwide. The course design included embedded peer support, project-based learning, and case studies; a post-course survey showed an overwhelming majority of responders “were able to personalize their own learning experiences” in an “engaging, easy to navigate” curriculum and greatly appreciated the 24/7 access to materials.

In addition to participant surveys, course authors using the Course Builder platform are able to conduct deeper analysis via web analytics and course data to assess course effectiveness and make improvements for future courses.

New opportunities to experience professional development MOOCs are rapidly emerging; the University of Adelaide recently announced their Digital Technology course to provide professional development for primary school teachers on the new Australian curriculum, the Google in Education team just launched a suite of courses for teachers using Google technologies, and the Friday Institute course that aligns with the U.S. based Common Core State Standards is now available.

We’re excited about the innovative approaches underway and the positive impact it can have for students and teachers around the world. We also look forward to seeing new, creative applications of MOOC platforms in new, unchartered territory.

Posted in Education, MOOC | No comments

Friday, 15 November 2013

Moore’s Law Part 4: Moore's Law in other domains

Posted on 12:15 by Unknown

This is the last entry of a series focused on Moore’s Law and its implications moving forward, edited from a White paper on Moore’s Law, written by Google University Relations Manager Michel Benard. This series quotes major sources about Moore’s Law and explores how they believe Moore’s Law will likely continue over the course of the next several years. We will also explore if there are fields other than digital electronics that either have an emerging Moore's Law situation, or promises for such a Law that would drive their future performance.

--

The quest for Moore’s Law and its potential impact in other disciplines is a journey the technology industry is starting, by crossing the Rubicon from the semiconductor industry to other less explored fields, but with the particular mindset created by Moore’s Law. Our goal is to explore if there are Moore’s Law opportunities emerging in other disciplines, as well as its potential impact. As such, we have interviewed several professors and researchers and asked them if they could see emerging ‘Moore’s Laws’ in their discipline. Listed below are some highlights of those discussions, ranging from CS+ to potentials in the Energy Sector:

Sensors and Data Acquisition
Ed Parsons, Google Geospatial Technologist
The More than Moore discussion can be extended to outside of the main chip, and go within the same board as the main chip or within the device that a user is carrying. Greater sensors capabilities (for the measurement of pressure, electromagnetic field and other local conditions) allow including them in smart phones, glasses, or other devices and perform local data acquisition. This trend is strong, and should allow future devices benefiting from Moore’s Law to receive enough data to perform more complex applications.

Metcalfe’s Law states that the value of a telecommunication network is proportional to the square of connected nodes of the system. This law can be used in parallel to Moore’s Law to evaluate the value of the Internet of Things. The network itself can be seen as composed by layers: at the user’s local level (to capture data related to the body of the user, or to immediately accessible objects), locally around the user (such as to get data within the same street as the user), and finally globally (to get data from the global internet). The extrapolation made earlier in this blog (several TB available in flash memory) will lead to the ability to construct, exchange and download/upload entire contexts for a given situation or a given application and use these contexts without intense network activity, or even with very little or no network activity.

Future of Moore’s Law and its impact on Physics
Sverre Jarp, CERN
CERN, and its experiments with the Large Electron-Positron Collider (LEP) and Large Hadron Collider (LHC) generate data on the order of a PetaByte per year; this data has to be filtered, processed and analyzed in order to find meaningful physics events leading to new discoveries. In this context Moore’s Law has been particularly helpful to allow computing power, storage and networking capabilities at CERN and at other High Energy Physics (HEP) centers to scale up regularly. Several generations of hardware and software have been exhausted during the journey from mainframes to today’s clusters.

CERN has a long tradition of collaboration with chip manufacturers, hardware and software vendors to understand and predict next trends in the computing evolution curve. Recent analysis indicates that Moore’s Law will likely continue over the next decade. The statement of ‘several TB of flash memory availability by 2025’ may even be a little conservative according to most recent analysis.

Big Data Visualizations
Katy Börner, Indiana University
Thanks to Moore’s Law, the amount of data available for any given phenomenon, whether sensed or simulated, has been growing by several orders of magnitude over the past decades. Intelligent sampling can be used to filter out the most relevant bits of information and is practiced in Physics, Astronomy, Medicine and other sciences. Subsequently, data needs to be analyzed and visualized to identify meaningful trends and phenomena, and to communicate them to others.

While most people learn in school how to read charts and maps, many never learn how to read a network layout—data literacy remains a challenge. The Information Visualization Massive Open Online Course (MOOC) at Indiana University teaches students from more than 100 countries how to read but also how to design meaningful network, topical, geospatial, and temporal visualizations. Using the tools introduced in this free course anyone can analyze, visualize, and navigate complex data sets to understand patterns and trends.

Candidate for Moore’s Law in Energy
Professor Francesco Stellacci, EPFL
It is currently hard to see a “Moore’s Law” applying to candidates in energy technology. Nuclear fusion could reserve some positive surprises, if several significant breakthroughs are found in the process of creating usable energy with this technique. For any other technology the technological growth will be slower. Best solar cells of today have a 30% efficiency, which could scale higher of course (obviously not much more than a factor of 3). Also cost could be driven down by an order of magnitude. Best estimates show, however, a combined performance improvement by a factor 30 over many years.

Further Discussion of Moore’s Law in Energy
Ross Koningstein, Google Director Emeritus
As of today there is no obvious Moore’s Law in the Energy sector which could decrease some major costs by 50% every 18 months. However material properties at nanoscale, and chemical processes such as catalysis are being investigated and could lead to promising results. Applications targeted are hydrocarbon creation at scale and improvement of oil refinery processes, where breakthrough in micro/nano property catalysts is pursued. Hydrocarbons are much more compatible at scale with the existing automotive/aviation and natural gas distribution systems. Here in California, Google Ventures has invested in Cool Planet Energy Systems, a company with neat technology that can convert biomass to gasoline/jet fuel/diesel with impressive efficiency.

One of the challenges is the ability to run many experiments at low cost per experiment, instead of only a few expensive experiments per year. Discoveries are likely to happen faster if more experiments are conducted. This leads to heavier investments, which are difficult to achieve within slim margin businesses. Therefore the nurturing processes for disruptive business are likely to come from new players, beside existing players which will decide to fund significant new investments.

Of course, these discussions could be opened for many other sectors. The opportunities for more discourse on the impact and future of Moore’s Law on CS and other disciplines are abundant, and can be continued with your comments on the Research at Google Google+ page. Please join, and share your thoughts.

Posted in | No comments

Thursday, 14 November 2013

The first detailed maps of global forest change

Posted on 11:00 by Unknown

Posted by Matt Hansen and Peter Potapov, University of Maryland; Rebecca Moore and Matt Hancher, Google

Most people are familiar with exploring images of the Earth’s surface in Google Maps and Earth, but of course there’s more to satellite data than just pretty pictures. By applying algorithms to time-series data it is possible to quantify global land dynamics, such as forest extent and change. Mapping global forests over time not only enables many science applications, such as climate change and biodiversity modeling efforts, but also informs policy initiatives by providing objective data on forests that are ready for use by governments, civil society and private industry in improving forest management.

In a collaboration led by researchers at the University of Maryland, we built a new map product that quantifies global forest extent and change from 2000 to 2012. This product is the first of its kind, a global 30 meter resolution thematic map of the Earth’s land surface that offers a consistent characterization of forest change at a resolution that is high enough to be locally relevant as well. It captures myriad forest dynamics, including fires, tornadoes, disease and logging.

Global 30 meter resolution thematic maps of the Earth’s land surface: Landsat composite reference image (2000), summary map of forest loss, extent and gain (2000-2012), individual maps of forest extent, gain, loss, and loss color-coded by year. Click to enlarge

The satellite data came from the Enhanced Thematic Mapper Plus (ETM+) sensor onboard the NASA/USGS Landsat 7 satellite. The expertise of NASA and USGS, from satellite design to operations to data management and delivery, is critical to any earth system study using Landsat data. For this analysis, we processed over 650,000 ETM+ images in order to characterize global forest change.

Key to the study’s success was the collaboration between remote sensing scientists at the University of Maryland, who developed and tested models for processing and characterizing the Landsat data, and computer scientists at Google, who oversaw the implementation of the final models using Google’s Earth Engine computation platform. Google Earth Engine is a massively parallel technology for high-performance processing of geospatial data, and houses a copy of the entire Landsat image catalog. For this study, a total of 20 terapixels of Landsat data were processed using one million CPU-core hours on 10,000 computers in parallel, in order to characterize year 2000 percent tree cover and subsequent tree cover loss and gain through 2012. What would have taken a single computer 15 years to perform was completed in a matter of days using Google Earth Engine computing.

Global forest loss totaled 2.3 million square kilometers and gain 0.8 million square kilometers from 2000 to 2012. Among the many results is the finding that tropical forest loss is increasing with an average of 2,101 additional square kilometers of forest loss per year over the study period. Despite the reduction in Brazilian deforestation over the study period, increasing rates of forest loss in countries such as Indonesia, Malaysia, Tanzania, Angola, Peru and Paraguay resulted in a statistically significant trend in increasing tropical forest loss. The maps and statistics from this study fill an information void for many parts of the world. The results can be used as an initial reference for countries lacking such information, as a spur to capacity building in such countries, and as a basis of comparison in evolving national forest monitoring methods. Additionally, we hope it will enable further science investigations ranging from the evaluation of the integrity of protected areas to the economic drivers of deforestation to carbon cycle modeling.

The Chaco woodlands of Bolivia, Paraguay and Argentina are under intensive pressure from agroindustrial development. Paraguay’s Chaco woodlands within the western half of the country are experiencing rapid deforestation in the development of cattle ranches. The result is the highest rate of deforestation in the world. Click to enlarge

Global map of forest change: http://earthenginepartners.appspot.com/science-2013-global-forest

If you are curious to learn more, tune in next Monday, November 18 to a live-streamed, online presentation and demonstration by Matt Hansen and colleagues from UMD, Google, USGS, NASA and the Moore Foundation:

Live-stream Presentation: Mapping Global Forest Change

Live online presentation and demonstration, followed by Q&A

Monday, November 18, 2013 at 1pm EST, 10am PST

Link to live-streamed event: http://goo.gl/JbWWTk

Please submit questions here: http://goo.gl/rhxK5X

For further results and details of this study, see High-Resolution Global Maps of 21st-Century Forest Cover Change in the November 15th issue of the journal Science.

Posted in Earth Engine | No comments

Wednesday, 13 November 2013

Moore’s Law, Part 3: Possible extrapolations over the next 15 years and impact

Posted on 09:30 by Unknown

This is the third entry of a series focused on Moore’s Law and its implications moving forward, edited from a White paper on Moore’s Law, written by Google University Relations Manager Michel Benard. This series quotes major sources about Moore’s Law and explores how they believe Moore’s Law will likely continue over the course of the next several years. We will also explore if there are fields other than digital electronics that either have an emerging Moore's Law situation, or promises for such a Law that would drive their future performance.

--

More Moore
We examine data from the ITRS 2012 Overall Roadmap Technology Characteristics (ORTC 2012), and select notable interpolations; The chart below shows chip size trends up to the year 2026 along with the “Average Moore’s Law” line. Additionally, in the ORTC 2011 tables we find data on 3D chip layer increases (up to 128 layers), including costs. Finally, the ORTC 2011 index sheet estimates that the DRAM cost per bit at production will be ~0.002 microcents per bit by ~2025. From these sources we draw three More Moore (MM) extrapolations, that by the year 2025:

4Tb Flash multi-level cell (MLC) memory will be in production
There will be ~100 billion transistors per microprocessing unit (MPU)
1TB RAM Memory will cost less than $100

More than Moore
It should be emphasized that “More than Moore” (MtM) technologies do not constitute an alternative or even a competitor to the digital trend as described by Moore’s Law. In fact, it is the heterogeneous integration of digital and non-digital functionalities into compact systems that will be the key driver for a wide variety of application fields. Whereas MM may be viewed as the brain of an intelligent compact system, MtM refers to its capabilities to interact with the outside world and the users.

As such, functional diversification may be regarded as a complement of digital signal and data processing in a product. This includes the interaction with the outside world through sensors and actuators and the subsystem for powering the product, implying analog and mixed signal processing, the incorporation of passive and/or high-voltage components, micro-mechanical devices enabling biological functionalities, and more. While MtM looks very promising for a variety of diversification topics, the ITRS study does not give figures from which “solid” extrapolations can be made. However, we can make safe/not so safe bets going towards 2025, and examine what these extrapolations mean in terms of the user.

Today we have a 1TB hard disk drives (HDD) for $100, but the access speed to data on the disk does not allow to take full advantage of this data in a fully interactive, or even practical, way. More importantly, the size and construction of HDD does not allow for their incorporation into mobile devices, Solid state drives (SSD), in comparison, have similar data transfer rates (~1Gb/s), latencies typically 100 times less than HDD, and have a significantly smaller form factor with no moving parts. The promise of offering several TB of flash memory, cost effectively by 2025, in a device carried along during the day (e.g. smartphone, watch, clothing, etc.) represents a paradigm shift with regard of today’s situation; it will empower the user by moving him/her from an environment where local data needs to be refreshed frequently (as with augmented reality applications) to a new environment where full contextual data will be available locally and refreshed only when critically needed.

If data is pre-loaded in the order of magnitude of TBs, one will be able to get a complete contextual data set loaded before an action or a movement, and the device will dispatch its local intelligence to the user during the progress of the action, regardless of network availability or performance. This opens up the possibility of combining local 3D models and remote inputs, allowing applications like 3D conferencing to become available. The development and use of 3D avatars could even facilitate many social interaction models. To benefit from such applications the use of personal devices such as Google Glass may become pervasive, allowing users to navigate 3D scenes and environments naturally, as well as facilitating 3D conferencing and their “social” interactions.

The opportunities for more discourse on the impact and future of Moore’s Law on CS and other disciplines are abundant, and can be continued with your comments on the Research at Google Google+ page. Please join, and share your thoughts.

Posted in | No comments

Tuesday, 12 November 2013

Moore’s Law, Part 2: More Moore and More than Moore

Posted on 09:30 by Unknown

This is the second entry of a series focused on Moore’s Law and its implications moving forward, edited from a White paper on Moore’s Law, written by Google University Relations Manager Michel Benard. This series quotes major sources about Moore’s Law and explores how they believe Moore’s Law will likely continue over the course of the next several years. We will also explore if there are fields other than digital electronics that either have an emerging Moore's Law situation, or promises for such a Law that would drive their future performance.

--

One of the fundamental lessons derived for the past successes of the semiconductor industry comes for the observation that most of the innovations of the past ten years—those that indeed that have revolutionized the way CMOS transistors are manufactured nowadays—were initiated 10–15 years before they were incorporated into the CMOS process. Strained silicon research began in the early 90s, high-κ/metal-gate initiated in the mid-90s and multiple-gate transistors were pioneered in the late 90s. This fundamental observation generates a simple but fundamental question: “What should the ITRS do to identify now what the extended semiconductor industry will need 10–15 years from now?”
- International Technology Roadmap for Semiconductors 2012

More Moore
As we look at the years 2020–2025, we can see that the physical dimensions of CMOS manufacture are expected to be crossing below the 10 nanometer threshold. It is expected that as dimensions approach the 5–7 nanometer range it will be difficult to operate any transistor structure that is utilizing the metal-oxide semiconductor (MOS) physics as the basic principle of operation. Of course, we expect that new devices, like the very promising tunnel transistors, will allow a smooth transition from traditional CMOS to this new class of devices to reach these new levels of miniaturization. However, it is becoming clear that fundamental geometrical limits will be reached in the above timeframe. By fully utilizing the vertical dimension, it will be possible to stack layers of transistors on top of each other, and this 3D approach will continue to increase the number of components per square millimeter even when horizontal physical dimensions will no longer be amenable to any further reduction. It seems important, then, that we ask ourselves a fundamental question: “How will we be able to increase the computation and memory capacity when the device physical limits will be reached?” It becomes necessary to re-examine how we can get more information in a finite amount of space.

The semiconductor industry has thrived on Boolean logic; after all, for most applications the CMOS devices have been used as nothing more than an “on-off” switch. Consequently, it becomes of paramount importance to develop new techniques that allow the use of multiple (i.e., more than 2) logic states in any given and finite location, which evokes the magic of “quantum computing” looming in the distance. However, short of reaching this goal, a field of active research involves increasing the number of states available, e.g. 4–10 states, and to increase the number of “virtual transistors” by 2 every 2 years.

More than Moore
During the blazing progress propelled by Moore’s Law of semiconductor logic and memory products, many “complementary” technologies have progressed as well, although not necessarily scaling to Moore’s Law. Heterogeneous integration of multiple technologies has generated “added value” to devices with multiple applications, beyond the traditional semiconductor logic and memory products that had lead the semiconductor industry from the mid 60s to the 90s. A variety of wireless devices contain typical examples of this confluence of technologies, e.g. logic and memory devices, display technology, microelectricomechanical systems (MEMS), RF and Analog/Mixed-signal technologies (RF/AMS), etc.

The ITRS has incorporated More than Moore and RF/AMS chapters in the main body of the ITRS, but is uncertain whether this is sufficient to encompass the plethora of associated technologies now entangled into modern products, or the multi-faceted public consumer who has become an influential driver of the semiconductor industry, demanding custom functionality in commercial electronic products. In the next blog of this series, we will examine select data from the ITRS Overall Roadmap Technology Characteristics (ORTC) 2012 and attempt to extrapolate the progress in the next 15 years, and its potential impact.

The opportunities for more discourse on the impact and future of Moore’s Law on CS and other disciplines are abundant, and can be continued with your comments on the Research at Google Google+ page. Please join, and share your thoughts.

Posted in | No comments

Monday, 11 November 2013

Moore’s Law, Part 1: Brief history of Moore's Law and current state

Posted on 09:30 by Unknown

This is the first entry of a series focused on Moore’s Law and its implications moving forward, edited from a White paper on Moore’s Law, written by Google University Relations Manager Michel Benard. This series quotes major sources about Moore’s Law and explores how they believe Moore’s Law will likely continue over the course of the next several years. We will also explore if there are fields other than digital electronics that either have an emerging Moore's Law situation, or promises for such a Law that would drive their future performance.

---

Moore's Law is the observation that over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years. The period often quoted as "18 months" is due to Intel executive David House, who predicted that period for a doubling in chip performance (being a combination of the effect of more transistors and their being faster). -Wikipedia

Moore’s Law is named after Intel co-founder Gordon E. Moore, who described the trend in his 1965 paper. In it, Moore noted that the number of components in integrated circuits had doubled every year from the invention of the integrated circuit in 1958 until 1965 and predicted that the trend would continue "for at least ten years". Moore’s prediction has proven to be uncannily accurate, in part because the law is now used in the semiconductor industry to guide long-term planning and to set targets for research and development.

The capabilities of many digital electronic devices are strongly linked to Moore's law: processing speed, memory capacity, sensors and even the number and size of pixels in digital cameras. All of these are improving at (roughly) exponential rates as well (see Other formulations and similar laws). This exponential improvement has dramatically enhanced the impact of digital electronics in nearly every segment of the world economy, and is a driving force of technological and social change in the late 20th and early 21st centuries.

Most improvement trends have resulted principally from the industry’s ability to exponentially decrease the minimum feature sizes used to fabricate integrated circuits. Of course, the most frequently cited trend is in integration level, which is usually expressed as Moore’s Law (that is, the number of components per chip doubles roughly every 24 months). The most significant trend is the decreasing cost-per-function, which has led to significant improvements in economic productivity and overall quality of life through proliferation of computers, communication, and other industrial and consumer electronics.

Transistor counts for integrated circuits plotted against their dates of introduction. The curve shows Moore's law - the doubling of transistor counts every two years. The y-axis is logarithmic, so the line corresponds to exponential growth

All of these improvement trends, sometimes called “scaling” trends, have been enabled by large R&D investments. In the last three decades, the growing size of the required investments has motivated industry collaboration and spawned many R&D partnerships, consortia, and other cooperative ventures. To help guide these R&D programs, the Semiconductor Industry Association (SIA) initiated the National Technology Roadmap for Semiconductors (NTRS) in 1992. Since its inception, a basic premise of the NTRS has been that continued scaling of electronics would further reduce the cost per function and promote market growth for integrated circuits. Thus, the Roadmap has been put together in the spirit of a challenge—essentially, “What technical capabilities need to be developed for the industry to stay on Moore’s Law and the other trends?”

In 1998, the SIA was joined by corresponding industry associations in Europe, Japan, Korea, and Taiwan to participate in a 1998 update of the Roadmap and to begin work toward the first International Technology Roadmap for Semiconductors (ITRS), published in 1999. The overall objective of the ITRS is to present industry-wide consensus on the “best current estimate” of the industry’s research and development needs out to a 15-year horizon. As such, it provides a guide to the efforts of companies, universities, governments, and other research providers or funders. The ITRS has improved the quality of R&D investment decisions made at all levels and has helped channel research efforts to areas that most need research breakthroughs.

For more than half a century these scaling trends continued, and sources in 2005 expected it to continue until at least 2015 or 2020. However, the 2010 update to the ITRS has growth slowing at the end of 2013, after which time transistor counts and densities are to double only every three years. Accordingly, since 2007 the ITRS has addressed the concept of functional diversification under the title “More than Moore” (MtM). This concept addresses an emerging category of devices that incorporate functionalities that do not necessarily scale according to “Moore's Law,” but provide additional value to the end customer in different ways.

The MtM approach typically allows for the non-digital functionalities (e.g., RF communication, power control, passive components, sensors, actuators) to migrate from the system board-level into a particular package-level (SiP) or chip-level (SoC) system solution. It is also hoped that by the end of this decade, it will be possible to augment the technology of constructing integrated circuits (CMOS) by introducing new devices that will realize some “beyond CMOS” capabilities. However, since these new devices may not totally replace CMOS functionality, it is anticipated that either chip-level or package level integration with CMOS may be implemented.

The ITRS provides a very comprehensive analysis of the perspective for Moore’s Law when looking towards 2020 and beyond. The analysis can be roughly segmented into two trends: More Moore (MM) and More than Moore (MtM). In the next blog in this series, we will look in the the recent conclusions mentioned in the ITRS 2012 report on both trends.

The opportunities for more discourse on the impact and future of Moore’s Law on CS and other disciplines are abundant, and can be continued with your comments on the Research at Google Google+ page. Please join, and share your thoughts.

Posted in | No comments

Thursday, 17 October 2013

Enhancing Linguistic Search with the Google Books Ngram Viewer

Posted on 06:01 by Unknown

Posted by Slav Petrov and Dipanjan Das, Research Scientists

Our book scanning effort, now in its eighth year, has put tens of millions of books online. Beyond the obvious benefits of being able to discover books and search through them, the project lets us take a step back and learn what the entire collection tells us about culture and language.

Launched in 2010 by Jon Orwant and Will Brockman, the Google Books Ngram Viewer lets you search for words and phrases over the centuries, in English, Chinese, Russian, French, German, Italian, Hebrew, and Spanish. It’s become popular for both casual explorations into language usage and serious linguistic research, and this summer we decided to provide some new ways to search with it.

With our interns Jason Mann, Lu Yang, and David Zhang, we’ve added three new features. The first is wildcards: by putting an asterisk as a placeholder in your query, you can retrieve the ten most popular replacement. For instance, what noun most often follows “Queen” in English fiction? The answer is “Elizabeth”:

This graph also reveals that the frequency of mentions of the most popular queens has been decreasing steadily over time. (Language expert Ben Zimmer shows some other interesting examples in his Atlantic article.) Right-clicking collapses all of the series into a sum, allowing you to see the overall change.

Another feature we’ve added is the ability to search for inflections: different grammatical forms of the same word. (Inflections of the verb “eat” include “ate”, “eating”, “eats”, and “eaten”.) Here, we can see that the phrase “changing roles” has recently surged in popularity in English fiction, besting “change roles”, which earlier dethroned “changed roles”:

Curiously, this switching doesn’t happen when we add non-fiction into the mix: “changing roles” is persistently on top, with an odd dip in the late 1980s. As with wildcards, right-clicking collapses and expands the data:

Finally, we’ve implemented the most common feature request from our users: the ability to search for multiple capitalization styles simultaneously. Until now, searching for common capitalizations of “Mother Earth” required using a plus sign to combine ngrams (e.g., “Mother Earth + mother Earth + mother earth”), but now the case-insensitive checkbox makes it easier:

As with our other two features, right-clicking toggles whether the variants are shown.

We hope these features help you discover and share interesting trends in language use!

Posted in Google Books, Ngram | No comments

Wednesday, 9 October 2013

Opening up Course Builder data

Posted on 11:00 by Unknown

Posted by John Cox and Pavel Simakov, Course Builder Team, Google Research

Course Builder is an experimental, open source platform for delivering massive online open courses. When you run Course Builder, you own everything from the production instance to the student data that builds up while your course is running.

Part of being open is making it easy for you to access and work with your data. Earlier this year we shipped a tool called ETL (short for extract-transform-load) that you can use to pull your data out of Course Builder, run arbitrary computations on it, and load it back. We wrote a post that goes into detail on how you can use ETL to get copies of your data in an open, easy-to-read format, as well as write custom jobs for processing that data offline.

Now we’ve taken the next step and added richer data processing tools to ETL. With them, you can build data processing pipelines that analyze large datasets with MapReduce. Inside Google we’ve used these tools to learn from the courses we’ve run. We provide example pipelines ranging from the simple to the complex, along with formatters to convert your data into open formats (CSV, JSON, plain text, and XML) that play nice with third-party data analysis tools.

We hope that adding robust data processing features to Course Builder will not only provide direct utility to organizations that need to process data to meet their internal business goals, but also make it easier for educators and researchers to gauge the efficacy of the massive online open courses run on the Course Builder platform.

Posted in Education, MapReduce, MOOC, open source | No comments

Thursday, 26 September 2013

Projecting without a projector: sharing your smartphone content onto an arbitrary display

Posted on 09:45 by Unknown

Posted by Yang Li, Research Scientist, Google Research

Previously, we presented Deep Shot, a system that allows a user to “capture” an application (such as Google Maps) running on a remote computer monitor via a smartphone camera and bring the application on the go. Today, we’d like to discuss how we support the opposite process, i.e., transferring mobile content to a remote display, again using the smartphone camera.

Although the computing power of today’s mobile devices grows at an accelerated rate, the form factor of these devices remains small, which constrains both the input and output bandwidth for mobile interaction. To address this issue, we investigated how to enable users to leverage nearby IO resources to operate their mobile devices. As part of the effort, we developed Open Project, an end-to-end framework that allows a user to “project” a native mobile application onto an arbitrary display using a smartphone camera, leveraging interaction spaces and input modality of the display. The display can range from a PC or laptop monitor, to a home Internet TV and to a public wall-sized display. Via an intuitive, projection-based metaphor, a user can easily share a mobile application by projecting it onto a target display.

Open Project is an open, scalable, web-based framework for enabling mobile sharing and collaboration. It can turn any computer display projectable instantaneously and without deployment. Developers can add support for Open Project in native mobile apps by simply linking a library, requiring no additional hardware or sensors. Our user participants responded highly positively to Open Project-enabled applications for mobile sharing and collaboration.

Posted in HCI | No comments

Tuesday, 17 September 2013

Broadening Google Patents

Posted on 07:38 by Unknown

Posted by Jon Orwant, Engineering Manager

Cross-posted with the US Public Policy Blog, the European Public Policy Blog, and Inside Search Blog.

Last year, we launched two improvements to Google Patents: the Prior Art Finder and European Patent Office (EPO) patents. Today we’re happy to announce the addition of documents from four new patent agencies: China, Germany, Canada, and the World Intellectual Property Organization (WIPO). Many of these documents may provide prior art for future patent applications, and we hope their increased discoverability will improve the quality of patents in the U.S. and worldwide.

So if you want to learn about a Chinese dual-drive bicycle, a German valve for inflating bicycle tires, attach a Canadian trailer to your bike, or read the WIPO application for pedalling with one leg, those and millions of other inventions are now available on Google Patents.

Thanks to Google Translate, all patents are available in both their original languages and in English, and you can search across the world’s patents using terms in any of those languages. When there are multiple submission languages, you can move between them with a single click on the tabs at the top of the page, as shown in the screenshot below:

Happy patent searching!

Posted in patents | No comments

Tuesday, 10 September 2013

We are joining the Open edX platform

Posted on 06:00 by Unknown

Posted by Dan Clancy, Director of Research

A year ago, we released Course Builder, an experimental platform for online education at scale. Since then, individuals have created courses on everything from game theory to philanthropy, offered to curious people around the world. Universities and non-profit organizations have used the platform to experiment with MOOCs, while maintaining direct relationships with their participants. Google has published a number of courses including Introduction to Web Accessibility which opens for registration today. This platform is helping to deliver on our goal of making education more accessible through technology, and enabling educators to easily teach at scale on top of cloud platform services.

Today, Google will begin working with edX as a contributor to the open source platform, Open edX. We are taking our learnings from Course Builder and applying them to Open edX to further innovate on an open source MOOC platform. We look forward to contributing to edX’s new site, MOOC.org, a new service for online learning which will allow any academic institution, business and individual to create and host online courses.

Google and edX have a shared mission to broaden access to education, and by working together, we can advance towards our goals much faster. In addition, Google, with its breadth of applicable infrastructure and research capabilities, will continue to make contributions to the online education space, the findings of which will be shared directly to the online education community and the Open edX platform.

We support the development of a diverse education ecosystem, as learning expands in the online world. Part of that means that educational institutions should easily be able to bring their content online and manage their relationships with their students. Our industry is in the early stages of MOOCs, and lots of experimentation is still needed to find the best way to meet the educational needs of the world. An open ecosystem with multiple players encourages rapid experimentation and innovation, and we applaud the work going on in this space today.

We appreciate the community that has grown around the Course Builder open source project. We will continue to maintain Course Builder, but are focusing our development efforts on Open edX, and look forward to seeing edX’s MOOC.org platform develop. In the future, we will provide an upgrade path to Open edX and MOOC.org from Course Builder. We hope that our continued contributions to open source education projects will enable anyone who builds online education products to benefit from our technology, services and scale. For learners, we believe that a more open online education ecosystem will make it easier for anyone to pick up new skills and concepts at any time, anywhere.

Posted in Education, MOOC | No comments

Make Your Websites More Accessible to More Users with Introduction to Web Accessibility

Posted on 05:45 by Unknown

Eve Andersson, Manager, Accessibility Engineering

Cross-posted with Google Developer's Blog

You work hard to build clean, intuitive websites. Traffic is high and still climbing, and your website provides a great user experience for all your users, right? Now close your eyes. Is your website easily navigable? According to the World Health Organization, 285 million people are visually impaired. That’s more than the populations of England, Germany, and Japan combined!

As the web has continued to evolve, websites have become more interactive and complex, and this has led to a reduction in accessibility for some users. Fortunately, there are some simple techniques you can employ to make your websites more accessible to blind and low-vision users and increase your potential audience. Introduction to Web Accessibility is Google’s online course that helps you do just that.

You’ll learn to make easy accessibility updates, starting with your HTML structure, without breaking code or sacrificing a beautiful user experience. You’ll also learn tips and tricks to inspect the accessibility of your websites using Google Chrome extensions. Introduction to Web Accessibility runs with support from Google content experts from September 17th - 30th, and is recommended for developers with basic familiarity with HTML, JavaScript, and CSS.

There’s a lot to learn in the realm of web accessibility, and a lot of work to be done to ensure users aren’t excluded from being able to easily navigate the web. By introducing fundamental tips to improve web usage for users with visual impairments, Introduction to Web Accessibility is a starting point to learn how to build accessibility features into your code. Registration is now open, so sign up today and help push the web toward becoming truly universally accessible.

Posted in accessibility, MOOC | No comments

Thursday, 5 September 2013

A Comparison of Five Google Online Courses

Posted on 10:15 by Unknown

Posted by Julia Wilkowski, Senior Instructional Designer

Google has taught five open online courses in the past year, reaching nearly 400,000 interested students. In this post I will share observations from experiments with a year’s worth of these courses. We were particularly surprised by how the size of our courses evolved during the year; how students responded to a non-linear, problem-based MOOC; and the value that many students got out of the courses, even after the courses ended.

Observation #1: Course size
We have seen varying numbers of registered students in the courses. Our first two courses (Power Searching versions one and two) garnered significant interest with over 100,000 students registering for each course. Our more recent courses have attracted closer to 40,000 students each. It’s likely that this is a result of initial interest in MOOCs starting to decline as well as students realizing that online courses require significant commitment of time and effort. We’d like other MOOC content aggregators to share their results so that we can identify overall MOOC patterns.

*based on surveys sent only to course completers. Other satisfaction scores represent aggregate survey results sent to all registrants.

Observation #2: Completion rates
Comparing these five two-week courses, we notice that most of them illustrate a completion rate (measured by the number of students who meet the course criteria for completion divided by the total number of registrants) of between 11-16%. Advanced Power Searching was an outlier at only 4%. Why? A possible answer can be found by comparing the culminating projects for each course: Power Searching consisted of students completing a multiple choice test; Advanced Power Searching students completed case studies of applying skills to research problems. After grading their work, students also had to solve a final search challenge.

Advanced Power Searching also differed from all of the other courses in the way it presented content and activities. Power Searching offered videos and activities in a highly structured, linear path; Advanced Power Searching presented students with a selection of challenges followed by supporting lessons. We observed a decreasing number of views on each challenge page similar to the pattern in the linear course (see figure 1).

Figure 1. Unique page views for Power Searching and Advanced Power Searching

Students who did complete Advanced Power Searching expressed satisfaction with the course (95% of course completing students would recommend the course to others, compared with 94% of survey respondents from Power Searching). We surmise that the lower completion rate for Advanced Power Searching compared to Power Searching could be a result of the relative difficulty of this course (it assumed significantly more foundational knowledge than Power Searching), the unstructured nature of the course, or a combination of these and other factors.

Even though completion rates seem low when compared with traditional courses, we are excited about the sheer number of students we’ve reached through our courses (over 51,000 earning certificates of completion). If we offered the same content to classrooms of 30 students, it would take over four and a half years of daily classes to teach the same information!

Observation #3: Students have varied goals
We would also like to move the discussion beyond completion rates. We’ve noticed that students register for online courses for many different reasons. In Mapping with Google, we asked students to select a goal during registration. We discovered that

52% of registrants intended to complete the course
48% merely wanted to learn a few new things about Google’s mapping tools

Post-course surveys revealed that

78% of students achieved the goal they defined at registration
89% of students learned new features of Google Maps
76% reported learning new features of Google Earth

Though a much smaller percentage of students completed course requirements, these statistics show that many of the students attained their learning goals.

Observation #4: Continued interest in post-course access
After each course ended, we kept many of the course materials (videos, activities) available. Though we removed access to the forums, final projects/assessments, and teaching assistants, we have seen significant interest in the content as measured by Google and YouTube Analytics. The Power Searching course pages have generated nearly three million page views after the courses finished; viewers have watched over 160,000 hours (18 years!) of course videos. In the two months since Mapping with Google finished, we have seen over 70,000 unique visitors to the course pages.

In all of our courses, we saw a high number of students interested in learning online: 96% of Power Searching participants agreed or strongly agreed that they would take a course in a similar format. We have succeeded in teaching tens of thousands of students to be more savvy users of Google tools. Future posts will take an in-depth look at our experiments with self-graded assessments, community elements that enhance learning, and design elements that influence student success.

Posted in MOOC | No comments

Monday, 12 August 2013

Google Research Awards: Summer 2013

Posted on 12:45 by Unknown

Posted by Maggie Johnson, Director of Education & University Relations

Another round of the Google Research Awards is complete. This is our biannual open call for proposals on computer science-related topics including machine learning and structured data, policy, human computer interaction, and geo/maps. Our grants cover tuition for a graduate student and provide both faculty and students the opportunity to work directly with Google scientists and engineers.

This round, we received 550 proposals from 50 countries. After expert reviews and committee discussions, we decided to fund 105 projects. The subject areas that received the highest level of support were human-computer interaction, systems and machine learning. In addition, 19% of the funding was awarded to universities outside the U.S.

We noticed some new areas emerging in this round of proposals. In particular, an increase of interest in neural networks, accessibility-related projects, and some innovative ideas in robotics. One project features the use of Android-based multi-robot systems which are significantly more complex than single robot systems. Faculty researchers are looking to explore novel uses of Google Glass such as an indoor navigation system for blind users, and how Glass can facilitate social interactions.

Congratulations to the well-deserving recipients of this round’s awards. If you are interested in applying for the next round (deadline is October 15), please visit our website for more information.

Posted in Research Awards, University Relations | No comments

Wednesday, 7 August 2013

Computer Science Teaching Fellows Starting Up in Charleston, SC

Posted on 09:00 by Unknown

Posted by Cameron Fadjo, Program Lead, Computer Science Teaching Fellows

Google recently started up an exciting new program to ignite interest in computer science (CS) for K12 kids. Located in our South Carolina data center, the Computer Science Teaching Fellows is a two-year post graduate fellowship for new STEM teachers and CS graduates. The goal is to bring computer science and computational thinking to all children, especially underrepresented minorities and girls, and close the gap between the ever-increasing demand in CS and the inadequate supply. We hope to learn what really works and scale those best practices regionally and then nationally.

The supply of CS majors in the pipeline has been a concern for many years. In 2007, the Computer Science education community was alarmed by the lack of CS majors and enrollments in US colleges and universities.

Source: 2009-2010 CRA Taulbee Survey (http://www.cra.org/resources/)

This prompted the development of several programs and activities to start raising awareness about the demand and opportunities for computer scientists, and to spark the interest of K12 students in CS. For example, the NSF funded curriculum and professional development around the new CS Principles Advanced Placement course. The CSTA published standards for K12 CS and a report on the limited extent to which schools, districts and states provide CS instruction to their students. CS advocacy groups, Computing in the Core and Code.org have played an instrumental role in adding provisions to the reauthorization of the Elementary and Secondary School Act to support CS education. More generally, we have seen innovations in online learning with MOOCs, machine learning to provide personalized learning experiences, and platforms like Khan Academy that allow flipped classrooms.

All of these activities represent a convergence in the CS education space, where existing programs are ready for scale, and technological advancements can support that scale in innovative ways. Our Teaching Fellows will be testing after school programs, classroom curriculum and online CS programs to determine what works and why. They’ll start in the local Charleston area and then spread the best programs and curriculum to South Carolina, Georgia, North Carolina (where we also have large data centers). They are currently preparing programs for the fall semester.

We are very excited about the convergence we are seeing in CS education and the potential to bring many more kids into a field that offers not only great career opportunities but also a shot at really making a difference in the world. We’ll keep you posted on the progress of our Teaching Fellows.

Posted in Computer Science, Education, K-12 | No comments

Thursday, 25 July 2013

Under the hood of Croatian, Filipino, Ukrainian, and Vietnamese in Google Voice Search

Posted on 14:30 by Unknown

Posted by Eugene Weinstein and Pedro Moreno, Google Speech Team

Although we’ve been working on speech recognition for several years, every new language requires our engineers and scientists to tackle unique challenges. Our most recent additions - Croatian, Filipino, Ukrainian, and Vietnamese - required creative solutions to reflect how each language is used across devices and in everyday conversations.

For example, since Vietnamese is a tonal language, we had to explore how to take tones into consideration. One simple technique is to model the tone and vowel combinations (tonemes) directly in our lexicons. This, however, has the side effect of a larger phonetic inventory. As a result we had to come up with special algorithms to handle the increased complexity. Additionally, Vietnamese is a heavily diacritized language, with tone markers on a majority of syllables. Since Google Search is very good at returning valid results even when diacritics are omitted, our Vietnamese users frequently omit the diacritics when typing their queries. This creates difficulties for the speech recognizer, which selects its vocabulary from typed queries. For this purpose, we created a special diacritic restoration algorithm which enables us to present properly formatted text to our users in the majority of cases.

Filipino also presented interesting challenges. Much like in other multilingual societies such as Hong Kong, India, South Africa, etc., Filipinos often mix several languages in their daily life. This is called code switching. Code switching complicates the design of pronunciation, language, and acoustic models. Speech scientists are effectively faced with a dilemma: should we build one system per language, or should we combine all languages into one?

In such situations we prefer to model the reality of daily language use in our speech recognizer design. If users mix several languages, our recognizers should do their best in modeling this behavior. Hence our Filipino voice search system, while mainly focused on the Filipino language, also allows users to mix in English terms.

The algorithms we’re using to model how speech sounds are spoken in each language make use of our distributed large-scale neural network learning infrastructure (yes, the same one that spontaneously discovered cats on YouTube!). By partitioning the gigantic parameter set of the model, and by evaluating each partition on a separate computation server, we’re able to achieve unprecedented levels of parallelism in training acoustic models.

The more people use Google speech recognition products, the more accurate the technology becomes. These new neural network technologies will help us bring you lots of improvements and many more languages in the future.

Posted in internationalization, Speech | No comments