2012 ~ Compact System

Wednesday, 19 December 2012

Conference Report: Workshop on Internet and Network Economics (WINE) 2012

Posted on 08:00 by Unknown

Posted by Vahab Mirrokni, Research Scientist, Google Research New York

Google regularly participates in the WINE conference: Workshop on Internet & Network Economics. WINE’12 just happened last week in Liverpool, UK, where there is a strong economics and computation group. WINE provides a forum for researchers across various disciplines to examine interesting algorithmic and economic problems of mutual interest that have emerged from the Internet over the past decade. For Google, the exchange of ideas at this selective workshop has resulted in innovation and improvements in algorithms and economic auctions, such as our display ad allocation.

Googlers co-authored three papers this year; here’s a synopsis of each, as well as some highlights from invited talks at the conference:

Budget Optimization for Online Campaigns with Positive Carryover Effects
This paper first argues that ad impressions may have some long-term impact on user behaviour, and refers to an older WWW ’10 paper. Based on this motivation, the paper presents a scalable budget optimization algorithm for online advertising campaigns in the presence of Markov user behavior. In such settings, showing an ad to a user may change their actions in the future through a Markov model, and the probability of conversion for the ad does not only depend on the last ad shown, but also on earlier user activities. The main purpose of the paper is to give a simpler algorithm to solve a constrained Markov Decision Process, and confirms this easier solution via simulations on some advertising data sets. The paper was written when Nikolay Archak, a PhD student at NYU business school, was an intern with the New York market algorithms research team.

On Fixed-Price Marketing for Goods with Positive Network Externalities
This paper presents an approximation algorithm for marketing “networked goods” and services that exhibit positive network externalities - for example, is the buyer's value for the goods or service influenced positively by other buyers owning the goods or using the service? Such positive network externalities arise in many products like operating systems or smartphone services. While most of previous research is concerned with influence maximization, this paper attempts to identify a revenue maximizing marketing strategy for such networked goods, as follows: The seller selects a set (S) of buyers and gives them the goods for free, then sets a fixed per-unit price (p), at which other consumers can buy the item. The strategy is consistent with practice and is easy to implement. The authors use ideas from non-negative submodular maximization to find the optimal revenue maximizing fixed-price marketing strategy.

The AND-OR game: Equilibrium Characterization
Yishay Mansour, former Visiting Faculty in Google New York, presented the results; he first argued that the existence and uniqueness of market equilibria is only known for markets with divisible goods and concave or convex utilities. Then he described a simple market AND-OR game for divisible goods. To my surprise, he showed a class of mixed strategies are basically the unique set of randomized equilibria for this market (up to minor changes in the outcome). At the end, Yishay challenged the audience to give such characterization for more general markets with indivisible goods.

Kamal Jain of Ebay Research gave an interesting talk about mechanism design problems, inspired by application in companies like Ebay and Google. In one part, Kamal proposed "coopetitive ad auctions" for settings in which the auctioneer runs an auction among buyers who may cooperate with some advertisers, and at the same time compete with others for sealing advertising slots. He gave context around "product ads"; for example, a retailer like Best Buy may cooperate with a manufacturer like HP to put out a product ad for an HP computer sold at Best Buy. Kamal argued that if the cooperation is not an explicit part of the auction, an advertiser may implicitly end up competing with itself, thus decreasing the social welfare. By making the cooperation an explicit part of the auction, he was able to design a mechanism with better social welfare and revenue properties, compared to both first-price and second-price auctions. Kamal also discussed optimal mechanisms for intermediaries, and “surplus auctions” to avoid cyclic bidding behavior resulted from running naive variants of first-price auctions in repeated settings.

David Parkes of Harvard University discussed techniques to combine mechanism design with machine learning or heuristic search algorithms. At one point David discussed how to implement a branch-and-bound search algorithm in a way that results in a "monotone" allocation rule, so that if we implement a VCG-type allocation and pricing rule based on this allocation algorithm, the resulting mechanism becomes truthful. David also presented ways to compute a set of prices for any allocation, respecting incentive compatibility constraints as much as possible. Both of these topics appeared in ACM EC 2012 papers that he had co-authored.

At the business meeting, there was a proposal to change the title of the conference from “workshop” to “conference” or “symposium” to reflect its fully peer-reviewed and archival nature, keeping the same acronym of WINE. (Changing the title to “Symposium on the Web, Internet, and Network Economics” was rejected: SWINE!) WINE 2013 will be held at Harvard University in Boston, MA, and we look forward to reconnecting with fellow researchers in the field and continuing to nurture new developments and research topics.

Posted in economics, market algorithms | No comments

Tuesday, 18 December 2012

Using online courses in Spain to teach entrepreneurship

Posted on 13:00 by Unknown

Posted by Francisco Ruiz Anton, Policy Manager, Google Spain

Cross-posted with the Policy by the Numbers Blog

At the end of the third quarter in 2012, roughly 25% of adults in Spain were out of work. More than half of adults under 24 years old are unemployed. Recent graduates and young adults preparing to enter the workforce face the toughest job market in decades.

The Internet presents an opportunity for growth and economic development. According to recent research, more than 100,000 jobs in Spain originate from the Internet and it directly contributes to the GDP with 26.7 billion euros (2.5%). That impact that could triple by 2015 under the right conditions.

One of those conditions is making high-quality education accessible, echoed by a recent OECD report on the youth labor market in Spain. This is no easy task. University degrees are in high demand, straining the reach of our existing institutions.

The web has become a way for learners to develop new skills when traditional institutions aren’t an option. Recent courses on platforms like Udacity, Coursera and edX have seen hundreds of thousands of students enroll and participate in courses taught by prestigious professors and lecturers.

Google is partnering with numerous organizations and universities in Spain to organize UniMOOC, an online course intended to educate citizens in Spain and the rest of the Spanish-speaking world about entrepreneurship. It was built with Course Builder, Google’s new open source toolkit for constructing online courses.

To date nearly 10,000 students have registered for the course, over two-thirds of them from Spain and one-third from 93 countries. It recently won an award for the “Most innovative project” in 2012 from the newspaper El Mundo.

Spain’s situation is not entirely unique in Europe. Policymakers across the continent are asking themselves how best to create economic opportunity for their citizens, and how to ensure that their best and brightest students are on a path toward financial success. Our hope is that the people taking this course will be more empowered with the right skills and tools to start their own businesses that can create jobs. They will push not only Spain, but Europe and the rest of the world towards economic recovery and growth.

The course is still running, and you’re able to join today.

Posted in Education | No comments

Monday, 17 December 2012

Millions of Core-Hours Awarded to Science

Posted on 09:00 by Unknown

Posted by Andrea Held, Program Manager, University Relations

In 2011 Google University Relations launched a new academic research awards program, Google Exacycle for Visiting Faculty, offering up to one billion core-hours to qualifying proposals. We were looking for projects that would consume 100M+ core-hours each and be of critical benefit to society. Not surprisingly, there was no shortage of applications.

Since then, the following seven scientists have been working on-site at Google offices in Mountain View and Seattle. They are here to run large computing experiments on Google’s infrastructure to change the future. Their projects include exploring antibiotic drug resistance, protein folding and structural modelling, drug discovery, and last but not least, the dynamic universe.

Today, we would like to introduce the Exacycle award recipients and their work. Please stay tuned for updates next year.

Simulating a Dynamic Universe with the Large Synoptic Sky Survey
Jeff Gardner, University of Washington, Seattle, WA
Collaborators: Andrew Connolly, University of Washington, Seattle, WA, and John Peterson, Purdue University, West Lafayette, IN

Research subject: The Large Synoptic Survey Telescope (LSST) is one of the most ambitious astrophysical research programs ever undertaken. Starting in 2019, the LSST’s 3.2 Gigapixel camera will repeatedly survey the southern sky, generating tens of petabytes of data every year. The images and catalogs from the LSST have the potential to transform both our understanding of the universe and the way that we engage in science in general.
Exacycle impact: In order to design the telescope to yield the best possible science, the LSST collaboration has undertaken a formidable computational campaign to simulate the telescope itself. This will optimize how the LSST surveys the sky and provide realistic datasets for the development of analysis pipelines that can operate on hundreds of petabytes. Using Exacycle, we are reducing the time required to simulate one night of LSST observing, roughly 5 million images, from 3 months down to a few days. This rapid turnaround will enable the LSST engineering teams to test new designs and new algorithms with unprecedented precision, which will ultimately lead to bigger and better science from the LSST.

Designing and Defeating Antibiotic Drug Resistance
Peter Kasson, Assistant Professor, Departments of Molecular Physiology and Biological Physics and of Biomedical Engineering, University of Virginia

Research subject: Antibiotics have made most bacterial infections routinely treatable. As antibiotic use has become common, bacterial resistance to these drugs has also increased. Recently, some bacteria have arisen that are resistant to almost all antibiotics. We are studying the basis for this resistance, in particular the enzyme that acts to break down many antibiotics. Identifying the critical changes required for pan-resistance will aid surveillance and prevention; it will also help elucidate targets for the development of new therapeutic agents.
Exacycle impact: Exacycle allows us to simulate the structure and dynamics of several thousand enzyme variants in great detail. The structural differences between enzymes from resistant and non-resistant bacteria are subtle, so we have developed methods to compare structural "fingerprints" of the enzymes and identify distinguishing characteristics. The complexity of this calculation and large number of potential bacterial sequences mean that this is a computationally intensive task; the massive computing power offered by Exacycle in combination with some novel sampling strategies make this calculation tractable.

Sampling the conformational space of G protein-coupled receptors
Kai Kohlhoff, Research Scientist at Google
Collaborators: Research labs of Vijay Pande and Russ Altman at Stanford University

Research subject: G protein-coupled receptors (GPCRs) are proteins that act as signal transducers in the cell membrane and influence the response of a cell to a variety of external stimuli. GPCRs play a role in many human diseases, such as asthma and hypertension, and are well established as a primary drug target.
Exacycle impact: Exacycle let us perform many tens of thousands of molecular simulations of membrane-bound GPCRs in parallel using the Gromacs software. With MapReduce, Dremel, and other technologies, we analyzed the 100s of Terabytes of generated data and built Markov State Models. The information contained in these models can help scientists design drugs that have higher potency and specificity than those presently available.
Results: Our models let us explore kinetically meaningful receptor states and transition rates, which improved our understanding of the structural changes that take place during activation of a signaling receptor. In addition, we used Exacycle to study the affinity of drug molecules when binding to different receptor states.

Modeling transport through the nuclear pore complex
Daniel Russel, post doc in structural biology, University of California, San Francisco

Research subject: Our goal is to develop a predictive model of transport through the nuclear pore complex (NPC). Developing the model requires understanding how the behavior of the NPC varies as we change the parameters governing the components of the system. Such a model will allow us to understand how transportins, the unstructured domains and the rest of the cellular milieu, interact to determine efficiency and specificity of macromolecular transport into and out of the nucleus.
Exacycle impact: Since data describing the microscopic behavior of most parts of the nuclear transport process is incomplete and contradictory, we have to explore a larger parameter space than would be feasible with traditional computational resources.
Status: We are currently modeling various experimental measurements of aspects of the nuclear transport process. These experiments range from simple ones containing only a few components of the transport process to measurements on the whole nuclear pore with transportins and cellular milieu.

Large scale screening for new drug leads that modulate the activity of disease-relevant proteins
James Swetnam, Scientific Software Engineer, drugable.org, NYU School of Medicine
Collaborators: Tim Cardozo, MD, PhD - NYU School of Medicine.

Research subject: We are using a high throughput, CPU-bound procedure known as virtual ligand screening to ‘dock’, or produce rough estimates of binding energy, for a large sample of bioactive chemical space to the entirety of known protein structures. Our goal is the first computational picture of how bioactive chemistry with therapeutic potential can affect human and pathogen biology.
Exacycle Impact: Typically, using our academic lab’s resources, we could screen a few tens of thousands of compounds against a single protein to try to find modulators of its function. To date, Exacycle has enabled us to screen 545,130 compounds against 8,535 protein structures that are involved in important and underserved diseases as cancer, diabetes, malaria, and HIV to look for new leads towards future drugs.
Status: We are currently expanding our screens to an additional 206,190 models from
ModBase. We aim to have a public dataset for the research community in the first half of 2013.

Protein Structure Prediction and Design
Michael Tyka, Research Fellow, University of Washington, Seattle, WA

Research subject: The precise relationship between the primary sequence and the three dimensional structure of proteins is one of the unsolved grand challenges of computational biochemistry. The Baker Lab has made significant progress in recent years by developing more powerful protein prediction and design algorithms using the Rosetta Protein Modelling suite.
Exacycle impact: Limitations in the accuracy of the physical model and lack of sufficient computational power have prevented solutions to broader classes of medically relevant problems. Exacycle allows us to improve model quality by conducting large parameter optimization sweeps with a very large dataset of experimental protein structural data. The improved energy functions will benefit the entire theoretical protein research community.

We are also using Exacycle to conduct simultaneous docking and one-sided protein design to develop novel protein binders for a number of medically relevant targets. For the first time, we are able to aggressively redesign backbone conformations at the binding site. This allows for a much greater flexibility in possible binding shapes but also hugely increases the space of possibilities that have to be sampled. Very promising designs have already been found using this method.

Posted in Exacycle, University Relations | No comments

Thursday, 13 December 2012

Continuing the quest for future computer scientists with CS4HS

Posted on 09:00 by Unknown

Erin Mindell, Program Manager, Google Education

Computer Science for High School (CS4HS) began five years ago with a simple question: How can we help create a much needed influx of CS majors into universities and the workforce? We took our questions to three of our university partners--University of Washington, Carnegie Mellon, and UCLA--and together we came up with CS4HS. The model was based on a “train the trainer” technique. By focusing our efforts on teachers and bringing them the skills they need to implement CS into their classrooms, we would be able to reach even more students. With grants from Google, our partner universities created curriculum and put together hands-on, community-based workshops for their local area teachers.

Since the initial experiment, CS4HS has exploded into a worldwide program, reaching more than 4,000 teachers and 200,000 students either directly or indirectly in more than 34 countries. These hands-on, in-person workshops are a hallmark of our program, and we will continue to fund these projects going forward. (For information on how to apply, please see our website.) The success of this popular program speaks for itself, as we receive more quality proposals each year. But success comes at a price, and we have found that the current format of the workshops is not infinitely scalable.

This is where Research at Google comes in. This year, we are experimenting with a new model for CS4HS workshops. By harnessing the success of online courses such as Power Searching with Google, and utilizing open-source platforms like the one found in Course Builder, we are hoping to put the “M” in “MOOC” and reach a broader audience of educators, eager to learn how to teach CS in their classrooms.

For this pilot, we are looking to sponsor two online workshops, one that is geared toward CS teachers, and one that is geared toward CS for non-CS teachers to go live in 2013. This is a way for a university (or several colleges working together) to create one incredible workshop that has the potential to reach thousands of enthusiastic teachers. Just as with our in-person workshops, applications will be open to college, university, and technical schools of higher learning only, as we depend on their curriculum expertise to put together the most engaging programs. For this pilot, we will be looking for MOOC proposals in the US and Canada only.

We are really excited about the possibilities of this new format, and we are looking for quality applications to fund. While applications don’t have to run on our Course Builder platform, we may be able to offer some additional support to funded projects that do. If you are interested in joining our experiment or just learning more, you can find information on how to apply on our CS4HS website (or click here).

Applications are open until February 16, 2013; we can’t wait to see what you come up with. If you have questions, please email us at cs4hs@google.com.

Posted in Computer Science, Education, MOOC, University Relations | No comments

Wednesday, 31 October 2012

Large Scale Language Modeling in Automatic Speech Recognition

Posted on 09:10 by Unknown

Posted by Ciprian Chelba, Research Scientist

At Google, we’re able to use the large amounts of data made available by the Web’s fast growth. Two such data sources are the anonymized queries on google.com and the web itself. They help improve automatic speech recognition through large language models: Voice Search makes use of the former, whereas YouTube speech transcription benefits significantly from the latter.

The language model is the component of a speech recognizer that assigns a probability to the next word in a sentence given the previous ones. As an example, if the previous words are “new york”, the model would assign a higher probability to “pizza” than say “granola”. The n-gram approach to language modeling (predicting the next word based on the previous n-1 words) is particularly well-suited to such large amounts of data: it scales gracefully, and the non-parametric nature of the model allows it to grow with more data. For example, on Voice Search we were able to train and evaluate 5-gram language models consisting of 12 billion n-grams, built using large vocabularies (1 million words), and trained on as many as 230 billion words.

The computational effort pays off, as highlighted by the plot above: both word error rate (a measure of speech recognition accuracy) and search error rate (a metric we use to evaluate the output of the speech recognition system when used in a search engine) decrease significantly with larger language models.

A more detailed summary of results on Voice Search and a few YouTube speech transcription tasks (authors: Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, Shankar Kumar) presents our results when increasing both the amount of training data, and the size of the language model estimated from such data. Depending on the task, availability and amount of training data used, as well as language model size and the performance of the underlying speech recognizer, we observe reductions in word error rate between 6% and 10% relative, for systems on a wide range of operating points.

Cross-posted with the Research at Google G+ Page

Posted in Natural Language Processing, Speech | No comments

Thursday, 18 October 2012

Ngram Viewer 2.0

Posted on 07:09 by Unknown

Posted by Jon Orwant, Engineering Manager

Since launching the Google Books Ngram Viewer, we’ve been overjoyed by the public reception. Co-creator Will Brockman and I hoped that the ability to track the usage of phrases across time would be of interest to professional linguists, historians, and bibliophiles. What we didn’t expect was its popularity among casual users. Since the launch in 2010, the Ngram Viewer has been used about 50 times every minute to explore how phrases have been used in books spanning the centuries. That’s over 45 million graphs created, each one a glimpse into the history of the written word. For instance, comparing flapper, hippie, and yuppie, you can see when each word peaked:

Meanwhile, Google Books reached a milestone, having scanned 20 million books. That’s approximately one-seventh of all the books published since Gutenberg invented the printing press. We’ve updated the Ngram Viewer datasets to include a lot of those new books we’ve scanned, as well as improvements our engineers made in OCR and in hammering out inconsistencies between library and publisher metadata. (We’ve kept the old dataset around for scientists pursuing empirical, replicable language experiments such as the ones Jean-Baptiste Michel and Erez Lieberman Aiden conducted for our Science paper.)

At Google, we’re also trying to understand the meaning behind what people write, and to do that it helps to understand grammar. Last summer Slav Petrov of Google’s Natural Language Processing group and his intern Yuri Lin (who’s since joined Google full-time) built a system that identified parts of speech—nouns, adverbs, conjunctions and so forth—for all of the words in the millions of Ngram Viewer books. Now, for instance, you can compare the verb and noun forms of “cheer” to see how the frequencies have converged over time:

Some users requested the ability to combine Ngrams, and Googler Matthew Gray generalized that notion into what we’re calling Ngram compositions: the ability to add, subtract, multiply, and divide Ngram counts. For instance, you can see how “record player” rose at the expense of “Victrola”:

Our info page explains all the details about this curious notion of treating phrases like components of a mathematical expression. We’re guessing they’ll only be of interest to lexicographers, but then again that’s what we thought about Ngram Viewer 1.0.

Oh, and we added Italian too, supplementing our current languages: English, Chinese, Spanish, French, German, Hebrew, and Russian. Buon divertimento!

Posted in Google Books, Natural Language Processing, Ngram | No comments

Thursday, 4 October 2012

ReFr: A New Open-Source Framework for Building Reranking Models

Posted on 13:45 by Unknown

Posted by Dan Bikel and Keith Hall, Research Scientists at Google

We are pleased to announce the release of an open source, general-purpose framework designed for reranking problems, ReFr (Reranker Framework), now available at: http://code.google.com/p/refr/.

Many types of systems capable of processing speech and human language text produce multiple hypothesized outputs for a given input, each with a score. In the case of machine translation systems, these hypotheses correspond to possible translations from some sentence in a source language to a target language. In the case of speech recognition, the hypotheses are possible word sequences of what was said derived from the input audio. The goal of such systems is usually to produce a single output for a given input, and so they almost always just pick the highest-scoring hypothesis.

A reranker is a system that uses a trained model to rerank these scored hypotheses, possibly inducing a different ranked order. The goal is that by employing a second model after the fact, one can make use of additional information not available to the original model, and produce better overall results. This approach has been shown to be useful for a wide variety of speech and natural language processing problems, and was the subject of one of the groups at the 2011 summer workshop at Johns Hopkins’ Center for Language and Speech Processing. At that workshop, led by Professor Brian Roark of Oregon Health & Science University, we began building a general-purpose framework for training and using reranking models. The result of all this work is ReFr.

From the outset, we designed ReFr with both speed and flexibility in mind. The core implementation is entirely in C++, with a flexible architecture allowing rich experimentation with both features and learning methods. The framework also employs a powerful runtime configuration mechanism to make experimentation even easier. Finally, ReFr leverages the parallel processing power of Hadoop to train and use large-scale reranking models in a distributed computing environment.

Posted in open source | No comments

Tuesday, 2 October 2012

EMEA Faculty Summit 2012

Posted on 12:15 by Unknown

Michel Benard, University Relations Manager

Last week we held our fifth Europe, Middle East and Africa (EMEA) Faculty Summit in London, bringing together 94 of EMEA’s foremost computer science academics from 65 universities representing 25 countries, together with more than 60 Googlers.

This year’s jam-packed agenda included a welcome reception at the Science Museum (plus a tour of the special exhibition: “Codebreaker - Alan Turing’s life and legacy”), a keynote on “Research at Google” by Alfred Spector, Vice President of Research and Special Initiatives and a welcome address by Nelson Mattos, Vice President of Engineering and Products in EMEA, covering Google’s engineering activity and recent innovations in the region.

The Faculty Summit is a chance for us to meet with academics in Computer Science and other areas to discuss the latest exciting developments in research and education, and to explore ways in which we can collaborate via our our University Relations programs.

The two and a half day program consisted of tech talks, break out sessions, a panel on online education, and demos. The program covered a variety of computer science topics including Infrastructure, Cloud Computing Applications, Information Retrieval, Machine Translation, Audio/Video, Machine Learning, User Interface, e-Commerce, Digital Humanities, Social Media, and Privacy. For example, Ed H. Chi summarized how researchers use data analysis to understand the ways users share content with their audiences using the Circle feature in Google+. Jens Riegelsberger summarized how UI design and user experience research is essential to creating a seamless experience on Google Maps. John Wilkes discussed some of the research challenges - and opportunities - associated with building, managing, and using computer systems at massive scale. Breakout sessions ranged from technical follow-ups on the talk topics to discussing ways to increase the presence of women in computer science.

We also held one-on-one sessions where academics and Googlers could meet privately and discuss topics of personal interest, such as how to develop a compelling research award proposal, how to apply for a sabbatical at Google or how to gain Google support for a conference in a particular research area.

The Summit provides a great opportunity to build and strengthen research and academic collaborations. Our hope is to drive research and education forward by fostering mutually beneficial relationships with our academic colleagues and their universities.

Posted in Faculty Summit | No comments

Tuesday, 18 September 2012

Running Continuous Geo Experiments to Assess Ad Effectiveness

Posted on 09:00 by Unknown

Posted by Jon Vaver, Research Scientist and Lizzy Van Alstine, Marketing Manager

Advertisers have a fundamental need to measure the effectiveness of their advertising campaigns. In a previous paper, we described the application of geo experiments to measuring the impact of advertising on consumer behavior (e.g. clicks, conversions, downloads). This method involves randomly assigning experimental units to control and test conditions and measuring the subsequent impact on consumer behavior. It is a practical way of incorporating the gold standard of randomized experiments into the analysis of marketing effectiveness. However, advertising decisions are not static, and the original method is most applicable to a one-time analysis. In a follow-up paper, we generalize the approach to accommodate periodic (ongoing) measurement of ad effectiveness.

In this expanded approach, the test and control assignments of each geographic region rotate across multiple test periods, and these rotations provide the opportunity to generate a sequence of measurements of campaign effectiveness. The data across test periods can also be pooled to create a single aggregate measurement of campaign effectiveness. These sequential and pooled measurements have smaller confidence intervals than measurements from a series of geo experiments with a single test period. Alternatively, the same confidence interval can be achieved with a reduced magnitude or duration of ad spend change, thereby lowering the cost of measurement. The net result is a better method for periodic and isolated measurement of ad effectiveness.

Posted in ads | No comments

Tuesday, 11 September 2012

Power Searching with Google is back

Posted on 09:00 by Unknown

Posted by Dan Russell, Uber Tech Lead, Search Quality & User Happiness

If you missed Power Searching with Google a few months ago or were unable to complete the course the first time around, now’s your chance to sign up again for our free online course that aims to empower our users with the tools and knowledge to find what they’re looking for more quickly and easily.

The community-based course features six 50-minute classes along with interactive activities and the opportunity to hear from search experts and Googlers about how search works. Beginning September 24, you can take the classes over a two-week period, share what you learn with other students in a community forum, and complete the course assessments to earn a certificate of completion.

During the course’s first run in July, people told us how they not only liked learning about new features and more efficient ways to use Google, but they also enjoyed sharing tips and learning from one another through the forums and Hangouts. Ninety-six percent of people who completed the course also said they liked the format and would be interested in taking similar courses, so we plan to offer a suite of upcoming courses in the coming months, including Advanced Power Searching.

Stay tuned for further announcements on those upcoming courses, and don’t forget to register now for Power Searching with Google. You’ll learn about things like how to search by color, image, and time and how to solve harder trivia questions like our A Google a Day questions. We’ll see you when we start up in two weeks!

Posted in Education, MOOC | No comments

Helping the World to Teach

Posted on 08:30 by Unknown

Posted by Peter Norvig, Director of Research

In July, Research at Google ran a large open online course, Power Searching with Google, taught by search expert, Dan Russell. The course was successful, with 155,000 registered students. Through this experiment, we learned that Google technologies can help bring education to a global audience. So we packaged up the technology we used to build Power Searching and are providing it as an open source project called Course Builder. We want to make this technology available so that others can experiment with online learning.

The Course Builder open source project is an experimental early step for us in the world of online education. It is a snapshot of an approach we found useful and an indication of our future direction. We hope to continue development along these lines, but we wanted to make this limited code base available now, to see what early adopters will do with it, and to explore the future of learning technology. We will be hosting a community building event in the upcoming months to help more people get started using this software. edX shares in the open source vision for online learning platforms, and Google and the edX team are in discussions about open standards and technology sharing for course platforms.

We are excited that Stanford University, Indiana University, UC San Diego, Saylor.org, LearningByGivingFoundation.org, Swiss Federal Institute of Technology in Lausanne (EPFL), and a group of universities in Spain led by Universia, CRUE, and Banco Santander-Universidades are considering how this experimental technology might work for some of their online courses. Sebastian Thrun at Udacity welcomes this new option for instructors who would like to create an online class, while Daphne Koller at Coursera notes that the educational landscape is changing and it is exciting to see new avenues for teaching and learning emerge. We believe Google’s preliminary efforts here may be useful to those looking to scale online education through the cloud.

Along with releasing the experimental open source code, we’ve provided documentation and forums for anyone to learn how to develop and deploy an online course like Power Searching. In addition, over the next two weeks we will provide educators the opportunity to connect with the Google team working on the code via Google Hangouts. For access to the code, documentation, user forum, and information about the Hangouts, visit the Course Builder Open Source Project Page. To see what is possible with the Course Builder technology register for Google’s next version of Power Searching. We invite you to explore this brave new world of online learning with us.

Posted in Education, MOOC, University Relations | No comments

Wednesday, 29 August 2012

Users love simple and familiar designs – Why websites need to make a great first impression

Posted on 12:01 by Unknown

Posted by Javier Bargas-Avila, Senior User Experience Researcher at YouTube UX Research

I’m sure you’ve experienced this at some point: You click on a link to a website, and after a quick glance you already know you’re not interested, so you click ‘back’ and head elsewhere. How did you make that snap judgment? Did you really read and process enough information to know that this website wasn’t what you were looking for? Or was it something more immediate?

We form first impressions of the people and things we encounter in our daily lives in an extraordinarily short timeframe. We know the first impression a website’s design creates is crucial in capturing users’ interest. In less than 50 milliseconds, users build an initial “gut feeling” that helps them decide whether they’ll stay or leave. This first impression depends on many factors: structure, colors, spacing, symmetry, amount of text, fonts, and more.

In our study we investigated how users' first impressions of websites are influenced by two design factors:

Visual complexity -- how complex the visual design of a website looks
Prototypicality -- how representative a design looks for a certain category of websites

We presented screenshots of existing websites that varied in both of these factors -- visual complexity and prototypicality -- and asked users to rate their beauty.

The results show that both visual complexity and prototypicality play crucial roles in the process of forming an aesthetic judgment. It happens within incredibly short timeframes between 17 and 50 milliseconds. By comparison, the average blink of an eye takes 100 to 400 milliseconds.

And these two factors are interrelated: if the visual complexity of a website is high, users perceive it as less beautiful, even if the design is familiar. And if the design is unfamiliar -- i.e., the site has low prototypicality -- users judge it as uglier, even if it’s simple.

In other words, users strongly prefer website designs that look both simple (low complexity) and familiar (high prototypicality). That means if you’re designing a website, you’ll want to consider both factors. Designs that contradict what users typically expect of a website may hurt users’ first impression and damage their expectations. Recent research shows that negative product expectations lead to lower satisfaction in product interaction -- a downward spiral you’ll want to avoid. Go for simple and familiar if you want to appeal to your users’ sense of beauty.

Posted in User Experience | No comments

Tuesday, 28 August 2012

Google at UAI 2012

Posted on 11:16 by Unknown

Posted by Kevin Murphy, Research Scientist

The conference on Uncertainty in Artificial Intelligence (UAI) is one of the premier venues for research related to probabilistic models and reasoning under uncertainty. This year's conference (the 28th) set several new records: the largest number of submissions (304 papers, last year 285), the largest number of participants (216, last year 191), the largest number of tutorials (4, last year 3), and the largest number of workshops (4, last year 1). We interpret this as a sign that the conference is growing, perhaps as part of the larger trend of increasing interest in machine learning and data analysis.

There were many interesting presentations. A couple of my favorites included:

"Video In Sentences Out," by Andrei Barbu et al. This demonstrated an impressive system that is able to create grammatically correct sentences describing the objects and actions occurring in a variety of different videos.
"Exploiting Compositionality to Explore a Large Space of Model Structures," by Roger Grosse et al. This paper (which won the Best Student Paper Award) proposed a way to view many different latent variable models for matrix decomposition - including PCA, ICA, NMF, Co-Clustering, etc. - as special cases of a general grammar. The paper then showed ways to automatically select the right kind of model for a dataset by performing greedy search over grammar productions, combined with Bayesian inference for model fitting.

A strong theme this year was causality. In fact, we had an invited talk on the topic by Judea Pearl, winner of the 2011 Turing Award, in addition to a one-day workshop. Although causality is sometimes regarded as something of an academic curiosity, its relevance to important practical problems (e.g., to medicine, advertising, social policy, etc.) is becoming more clear. There is still a large gap between theory and practice when it comes to making causal predictions, but it was pleasing to see that researchers in the UAI community are making steady progress on this problem.

There were two presentations at UAI by Googlers. The first, "Latent Structured Ranking," by Jason Weston and John Blitzer, described an extension to a ranking model called Wsabie, that was published at ICML in 2011, and is widely used within Google. The Wsabie model embeds a pair of items (say a query and a document) into a low dimensional space, and uses distance in that space as a measure of semantic similarity. The UAI paper extends this to the setting where there are multiple candidate documents in response to a given query. In such a context, we can get improved performance by leveraging similarities between documents in the set.

The second paper by Googlers, "Hokusai - Sketching Streams in Real Time," was presented by Sergiy Matusevych, Alex Smola and Amr Ahmed. (Amr recently joined Google from Yahoo, and Alex is a visiting faculty member at Google.) This paper extends the Count-Min sketch method for storing approximate counts to the streaming context. This extension allows one to compute approximate counts of events (such as the number of visitors to a particular website) aggregated over different temporal extents. The method can also be extended to store approximate n-gram statistics in a very compact way.

In addition to these presentations, Google was involved in UAI in several other ways: I held a program co-chair position on the organizing committee, several of the referees and attendees work at Google, and Google provided some sponsorship for the conference.

Overall, this was a very successful conference, in an idyllic setting (Catalina Island, an hour off the coast of Los Angeles). We believe UAI and its techniques will grow in importance as various organizations -- including Google -- start combining structured, prior knowledge with raw, noisy unstructured data.

Posted in conference, Machine Learning | No comments

Thursday, 23 August 2012

Better table search through Machine Learning and Knowledge

Posted on 13:00 by Unknown

Posted By Johnny Chen, Product Manager, Google Research

The Web offers a trove of structured data in the form of tables. Organizing this collection of information and helping users find the most useful tables is a key mission of Table Search from Google Research. While we are still a long way away from the perfect table search, we made a few steps forward recently by revamping how we determine which tables are "good" (one that contains meaningful structured data) and which ones are "bad" (for example, a table that hold the layout of a Web page). In particular, we switched from a rule-based system to a machine learning classifier that can tease out subtleties from the table features and enables rapid quality improvement iterations. This new classifier is a support vector machine (SVM) that makes use of multiple kernel functions which are automatically combined and optimized using training examples. Several of these kernel combining techniques were in fact studied and developed within Google Research [1,2].

We are also able to achieve a better understanding of the tables by leveraging the Knowledge Graph. In particular, we improved our algorithms for identifying the context and topics of each table, the entities represented in the table and the properties they have. This knowledge not only helps our classifier make a better decision on the quality of the table, but also enables better matching of the table to the user query.

Finally, you will notice that we added an easy way for our users to import Web tables found through Table Search into their Google Drive account as Fusion Tables. Now that we can better identify good tables, the import feature enables our users to further explore the data. Once in Fusion Tables, the data can be visualized, updated, and accessed programmatically using the Fusion Tables API.

These enhancements are just the start. We are continually updating the quality of our Table Search and adding features to it.

Stay tuned for more from Boulos Harb, Afshin Rostamizadeh, Fei Wu, Cong Yu and the rest of the Structured Data Team.

[1] Algorithms for Learning Kernels Based on Centered Alignment
[2] Generalization Bounds for Learning Kernels

Posted in Structured Data | No comments

Wednesday, 22 August 2012

Machine Learning Book for Students and Researchers

Posted on 08:00 by Unknown

Posted by Afshin Rostamizadeh, Google Research

Our machine learning book, The Foundations of Machine Learning, is now published! The book, with authors from both Google Research and academia, covers a large variety of fundamental machine learning topics in depth, including the theoretical basis of many learning algorithms and key aspects of their applications. The material presented takes its origin in a machine learning graduate course, "Foundations of Machine Learning", taught by Mehryar Mohri over the past seven years and has considerably benefited from comments and suggestions from students and colleagues at Google.

The book can serve as a textbook for both graduate students and advanced undergraduate students and a reference manual for researchers in machine learning, statistics, and many other related areas. It includes as a supplement introductory material to topics such as linear algebra and optimization and other useful conceptual tools, as well as a large number of exercises at the end of each chapter whose full solutions are provided online.

Posted in Machine Learning, publication | No comments

Monday, 20 August 2012

Faculty Summit 2012: Online Education Panel

Posted on 12:57 by Unknown

Posted by Peter Norvig, Director of Research

On July 26th, Google's 2012 Faculty Summit hosted computer science professors from around the world for a chance to talk and hear about some of the work done by Google and by our faculty partners. One of the sessions was a panel on Online Education. Daphne Koller's presentation on "Education at Scale" describes how a talk about YouTube at the 2009 Google Faculty Summit was an early inspiration for her, as she was formulating her approach that led to the founding of Coursera. Koller started with the goal of allowing Stanford professors to have more time for meaningful interaction with their students, rather than just lecturing, and ended up with a model based on the flipped classroom, where students watch videos out of class, and then come together to discuss what they have learned. She then refined the flipped classroom to work when there is no classroom, when the interactions occur in online discussion forums rather than in person. She described some fascinating experiments that allow for more flexible types of questions (beyond multiple choice and fill-in-the-blank) by using peer grading of exercises.

In my talk, I describe how I arrived at a similar approach but starting with a different motivation: I wanted a textbook that was more interactive and engaging than a static paper-based book, so I too incorporated short videos and frequent interactions for the Intro to AI class I taught with Sebastian Thrun.

Finally, Bradley Horowitz, Vice President of Product Management for Google+ gave a talk describing the goals of Google+. It is not to build the largest social network; rather it is to understand our users better, so that we can serve them better, while respecting their privacy, and keeping each of their conversations within the appropriate circle of friends. This allows people to have more meaningful conversations, within a limited context, and turns out to be very appropriate to education.

By bringing people together at events like the Faculty Summit, we hope to spark the conversations and ideas that will lead to the next breakthroughs, perhaps in online education, or perhaps in other fields. We'll find out a few years from now what ideas took root at this year's Summit.

Posted in Education, Faculty Summit | No comments

Tuesday, 14 August 2012

Improving Google Patents with European Patent Office patents and the Prior Art Finder

Posted on 07:00 by Unknown

Posted by Jon Orwant, Engineering Manager

Cross-posted with the US Public Policy Blog, the European Public Policy Blog, and Inside Search Blog

At Google, we're constantly trying to make important collections of information more useful to the world. Since 2006, we’ve let people discover, search, and read United States patents online. Starting this week, you can do the same for the millions of ideas that have been submitted to the European Patent Office, such as this one.

Typically, patents are granted only if an invention is new and not obvious. To explain why an invention is new, inventors will usually cite prior art such as earlier patent applications or journal articles. Determining the novelty of a patent can be difficult, requiring a laborious search through many sources, and so we’ve built a Prior Art Finder to make this process easier. With a single click, it searches multiple sources for related content that existed at the time the patent was filed.

Patent pages now feature a “Find prior art” button that instantly pulls together information relevant to the patent application.

The Prior Art Finder identifies key phrases from the text of the patent, combines them into a search query, and displays relevant results from Google Patents, Google Scholar, Google Books, and the rest of the web. You’ll start to see the blue “Find prior art” button on individual patent pages starting today.

Our hope is that this tool will give patent searchers another way to discover information relevant to a patent application, supplementing the search techniques they use today. We’ll be refining and extending the Prior Art Finder as we develop a better understanding of how to analyze patent claims and how to integrate the results into the workflow of patent searchers.

These are small steps toward making this collection of important but complex documents better understood. Sometimes language can be a barrier to understanding, which is why earlier this year we released an update to Google Translate that incorporates the European Patent Office’s parallel patent texts, allowing the EPO to provide translation between English, French, German, Spanish, Italian, Portuguese, and Swedish, with more languages scheduled for the future. And with the help of the United States Patent & Trademark Office, we’ve continued to add to our repository of USPTO bulk data, making it easier for researchers and law firms to analyze the entire corpus of US patents. More to come!

Posted in | No comments

Wednesday, 8 August 2012

Teaching the World to Search

Posted on 11:00 by Unknown

Posted by Maggie Johnson, Director of Education and University Relations

For two weeks in July, we ran Power Searching with Google, a MOOC (Massive Open Online Course) similar to those pioneered by Stanford and MIT. We blended this format with our social and communication tools to create a community learning experience around search. The course covered tips and tricks for Google Search, like using the search box as a calculator, or color filtering to find images.

The course had interactive activities to practice new skills and reinforce learning, and many opportunities to connect with other students using tools such as Google Groups, Moderator and Google+. Two of our search experts, Dan Russell and Matt Cutts, moderated Hangouts on Air, answering dozens of questions from students in the course. There were pre-, mid- and post-class assessments that students were required to pass to receive a certificate of completion. The course content is still available.

We had 155,000 students register for the course, from 196 countries. Of these, 29% of those who completed the first assessment passed the course and received a certificate. What was especially surprising was 96% of the students who completed the course liked the format and would be interested in taking other MOOCs.

This learning format is not new, as anyone who has worked in eLearning over the past 20 years knows. But what makes it different now is the large, global cohort of students who go through the class together. The discussion forums and Google+ streams were very active with students asking and answering questions, and providing additional ideas and content beyond what’s offered by the instructor. This learning interaction enabled by a massive “classroom”, is truly a new experience for students and teachers in an online environment.

Going forward, we will be offering Power Searching with Google again, so if you missed the first opportunity to get your certificate, you’ll have a second chance. Watch here for news about Power Searching as well as some educational ideas that we are exploring.

Posted in Education, Google+, MOOC | No comments

Monday, 6 August 2012

Speech Recognition and Deep Learning

Posted on 11:00 by Unknown

Posted by Vincent Vanhoucke, Research Scientist, Speech Team

The New York Times recently published an article about Google’s large scale deep learning project, which learns to discover patterns in large datasets, including... cats on YouTube!

What’s the point of building a gigantic cat detector you might ask? When you combine large amounts of data, large-scale distributed computing and powerful machine learning algorithms, you can apply the technology to address a large variety of practical problems.

With the launch of the latest Android platform release, Jelly Bean, we’ve taken a significant step towards making that technology useful: when you speak to your Android phone, chances are, you are talking to a neural network trained to recognize your speech.

Using neural networks for speech recognition is nothing new: the first proofs of concept were developed in the late 1980s(1), and after what can only be described as a 20-year dry-spell, evidence that the technology could scale to modern computing resources has recently begun to emerge(2). What changed? Access to larger and larger databases of speech, advances in computing power, including GPUs and fast distributed computing clusters such as the Google Compute Engine, unveiled at Google I/O this year, and a better understanding of how to scale the algorithms to make them effective learners.

The research, which reduces the error rate by over 20%, will be presented(3) at a conference this September, but true to our philosophy of integrated research, we’re delighted to bring the bleeding edge to our users first.

--

1 Phoneme recognition using time-delay neural networks, A. Waibel, T. Hanazawa, G. Hinton, K. Shikano and K.J. Lang. IEEE Transactions on Acoustics, Speech and Signal Processing, vol.37, no.3, pp.328-339, Mar 1989.

2 Acoustic Modeling using Deep Belief Networks, A. Mohamed, G. Dahl and G. Hinton. Accepted for publication in IEEE Transactions on Audio, Speech and Language Processing.

3 Application Of Pretrained Deep Neural Networks To Large Vocabulary Speech Recognition, N. Jaitly, P. Nguyen, A. Senior and V. Vanhoucke, Accepted for publication in the Proceedings of Interspeech 2012.

Posted in Acoustic Modeling, Deep Learning, Machine Learning, Speech | No comments

Thursday, 2 August 2012

Reflections on Digital Interactions: Thoughts from the 2012 NA Faculty Summit

Posted on 09:12 by Unknown

Posted by Alfred Spector, Vice President of Research and Special Initiatives

Last week, we held our eighth annual North America Computer Science Faculty Summit at our headquarters in Mountain View. Over 100 leading faculty joined us from 65 universities located in North America, Asia Pacific and Latin America to attend the two-day Summit, which focused on new interactions in our increasingly digital world.

In my introductory remarks, I shared some themes that are shaping our research agenda. The first relates to the amazing scale of systems we now can contemplate. How can we get to computational clouds of, perhaps, a billion cores (or processing elements)? How can such clouds be efficient and manageable, and what will they be capable of? Google is actively working on most aspects of large scale systems, and we continue to look for opportunities to collaborate with our academic colleagues. I note that we announced a cloud-based program to support Education based on Google App Engine technology.

Another theme in my introduction was semantic understanding. With the introduction of our Knowledge Graph and other work, we are making great progress toward data-driven analysis of the meaning of information. Users, who provide a continual stream of subtle feedback, drive continuous improvement in the quality of our systems, whether about a celebrity, the meaning of a word in context, or a historical event. In addition, we have found that the combination of information from multiple sources helps us understand meaning more efficiently. When multiple signals are aggregated, particularly with different types of analysis, we have fewer errors and improved semantic understanding. Applying the “combination hypothesis,” makes systems more intelligent.

Finally, I talked about User Experience. Our field is developing ever more creative user interfaces (which both present information to users, and accept information from them), partially due to the revolution in mobile computing but also due in-part to the availability of large-scale processing in the cloud and deeper semantic understanding. There is no doubt that our interactions with computers will be vastly different 10 years from now, and they will be significantly more fluid, or natural.

This page lists the Googler and Faculty presentations at the summit.

One of the highest intensity sessions we had was the panel on online learning with Daphne Koller from Stanford/Coursera, and Peter Norvig and Bradley Horowitz from Google. While there is a long way to go, I am so pleased that academicians are now thinking seriously about how information technology can be used to make education more effective and efficient. The infrastructure and user-device building blocks are there, and I think the community can now quickly get creative and provide the experiences we want for our students. Certainly, our own recent experience with our online Power Searching Course shows that the baseline approach works, but it also illustrates how much more can be done.

I asked Elliot Solloway (University of Michigan) and Cathleen Norris (University of North Texas), two faculty attendees, to provide their perspective on the panel and they have posted their reflections on their blog.

The digital era is changing the human experience. The summit talks and sessions exemplified the new ways in which we interact with devices, each other, and the world around us, and revealed the vast potential for further innovation in this space. Events such as these keep ideas flowing and it’s immensely fun to be part of very broadly-based, computer science community.

Posted in Education, Faculty Summit, Information Retrieval, UI, User Experience | No comments

Tuesday, 31 July 2012

Natural Language in Voice Search

Posted on 17:42 by Unknown

Posted by Jakob Uszkoreit, Software Engineer

On July 26 and 27, we held our eighth annual Computer Science Faculty Summit on our Mountain View Campus. During the event, we brought you a series of blog posts dedicated to sharing the Summit's talks, panels and sessions, and we continue with this glimpse into natural language in voice search. --Ed

At this year’s Faculty Summit, I had the opportunity to showcase the newest version of Google Voice Search. This version hints at how Google Search, in particular on mobile devices and by voice, will become increasingly capable of responding to natural language queries.

I first outlined the trajectory of Google Voice Search, which was initially released in 2007. Voice actions, launched in 2010 for Android devices, made it possible to control your device by speaking to it. For example, if you wanted to set your device alarm for 10:00 AM, you could say “set alarm for 10:00 AM. Label: meeting on voice actions.” To indicate the subject of the alarm, a meeting about voice actions, you would have to use the keyword “label”! Certainly not everyone would think to frame the requested action this way. What if you could speak to your device in a more natural way and have it understand you?

At last month’s Google I/O 2012, we announced a version of voice actions that supports much more natural commands. For instance, your device will now set an alarm if you say “my meeting is at 10:00 AM, remind me”. This makes even previously existing functionality, such as sending a text message or calling someone, more discoverable on the device -- that is, if you express a voice command in whatever way feels natural to you, whether it be “let David know I’ll be late via text” or “make sure I buy milk by 3 pm”, there is now a good chance that your device will respond how you anticipated it to.

I then discussed some of the possibly unexpected decisions we made when designing the system we now use for interpreting natural language queries or requests. For example, as you would expect from Google, our approach to interpreting natural language queries is data-driven and relies heavily on machine learning. In complex machine learning systems, however, it is often difficult to figure out the underlying cause for an error: after supplying them with training and test data, you merely obtain a set of metrics that hopefully give a reasonable indication about the system’s quality but they fail to provide an explanation for why a certain input lead to a given, possibly wrong output.

As a result, even understanding why some mistakes were made requires experts in the field and detailed analysis, rendering it nearly impossible to harness non-experts in analyzing and improving such systems. To avoid this, we aim to make every partial decision of the system as interpretable as possible. In many cases, any random speaker of English could look at its possibly erroneous behavior in response to some input and quickly identify the underlying issue - and in some cases even fix it!

We are especially interested in working with our academic colleagues on some of the many fascinating research and engineering challenges in building large-scale, yet interpretable natural language understanding systems and devising the machine learning algorithms this requires.

Posted in Faculty Summit, Machine Learning, Natural Language Processing, Speech | No comments

Friday, 27 July 2012

New Challenges in Computer Science Research

Posted on 15:43 by Unknown

Posted by Jeff Walz, Head of University Relations

Yesterday afternoon at the 2012 Computer Science Faculty Summit, there was a round of lightning talks addressing some of the research problems faced by Google across several domains. The talks pointed out some of the biggest challenges emerging from increasing digital interaction, which is this year’s Faculty Summit theme.

Research Scientist Vivek Kwatra kicked things off with a talk about video stabilization on YouTube. The popularity of mobile devices with cameras has led to an explosion in the amount of video people capture, which can often be shaky. Vivek and his team have found algorithmic approaches to make casual videos look more professional by simulating professional camera moves. Their stabilization technology vastly improves the quality of amateur footage.

Next, Ed Chi (Research Scientist) talked about social media focusing on the experimental circle model that characterizes Google+. Ed is particularly interested in how social interaction on the web can be designed to mimic live communication. Circles on Google+ allow a user to manage their audience and share content in a targeted fashion, which reflects face-to-face interaction. Ed discussed how, from an HCI perspective, the challenge going forward is the need to consider the trinity of social media: context, audience, content.

John Wilkes, Principal Software Engineer, talked about cluster management at Google and the challenges of building a new cluster manager-- that is, an operating system for a fleet of machines. Everything at Google is big and a consequence of operating at such tremendous scale is that machines are bound to fail. John’s team is working to make things easier for internal users enabling our ability to respond to more system requests. There are several hard problems in this domain, such as issues with configuration, making it as easy as possible to run a binary, increasing failure tolerance, and helping internal users understand their own needs as well as the behavior and performance of their system in our complicated distributed environment.

Research Scientist and coffee connoisseur Alon Halevy took to the podium to confirm that he did indeed author an empirical book on coffee, and also talked with attendees about structured data on the web. Structured data is comprised of hundreds of millions of (relatively small) tables of data, and Alon’s work is focused on enabling data enthusiasts to discover and visualize those data sets. Great possibilities open up when people start combining data sets in meaningful ways, which inspired the creation of Fusion Tables. An example is a map made in the aftermath of the 2011 earthquake and tsunami in Japan, that shows natural disaster data alongside the locations of the world’s nuclear plants. Moving forward, Alon’s team will continue to think about interesting things that can be done with data, and the techniques needed to distinguish good data from bad data.

To wrap up the session, Praveen Paritosh did a brief, but deep dive into the Knowledge Graph, an intelligent model that understands real-world entities and their relationships to one another-- things, not strings-- which launched earlier this year.

The Google Faculty Summit continued today with more talks, and breakout sessions centered on our theme of digital interaction. Check back for additional blog posts in the coming days.

Posted in Google+, HCI, Information Retrieval, Structured Data, Systems, YouTube | No comments

Education in the Cloud

Posted on 08:00 by Unknown

Posted by Andrea Held, University Relations

In the last 10 years, we’ve seen a major transition from stand-alone applications that run on desktop computers to applications running in the cloud. Unfortunately, many computer science students don’t have the opportunity to learn and work in the cloud due to a lack of resources in traditional undergrad programs. Without this access students are limited to the resources their school can provide.

So today, we’re announcing a new award program: the Google App Engine Education Awards. We are excited because Google App Engine can teach students how to build sophisticated large-scale systems in the cloud without needing access to a large physical network.

Google App Engine can be used to build mobile or social applications, traditional browser-based applications, or stand-alone web services that scale to millions of users with ease. The Google App Engine infrastructure and storage tools are useful for collecting and analyzing educational data, building a learning management system to organize courses, or implementing a teacher forum for exchanging ideas and practices. All of these adaptations of the Google App Engine platform will use the same infrastructure that powers Google.

We invite teachers at universities across the United States to submit a proposal describing how to use Google App Engine for their course development, educational research or tools, or for student projects. Selected proposals will receive $1,000 in App Engine credits.

If you teach at an accredited college, university or community college in the US, we encourage you to apply. You can submit a proposal by filling out this form. The application deadline is midnight PST August 31, 2012.

Posted in App Engine, Awards, University Relations | No comments

Thursday, 26 July 2012

Big Pictures with Big Messages

Posted on 19:29 by Unknown

Posted by Maggie Johnson, Director of Education and University Relations

Google’s Eighth Annual Computer Science Faculty Summit opened today in Mountain View with a fascinating talk by Fernanda Viégas and Martin Wattenberg, leaders of the data visualization group at our Cambridge office. They provided insight into their design process in visualizing big data, by highlighting Google+ Ripples and a map of the wind they created.

To preface his explanation of the design process, Martin shared that his team “wants visualization to be ‘G-rated,’ showing the full detail of the data - there’s no need to simplify it, if complexity is done right.” Martin discussed how their wind map started as a personal art project, but has gained interest particularly among groups that are interested in information on the wind (sailors, surfers, firefighters). The map displays surface wind data from the US National Digital Forecast Database and updates hourly. You can zoom around the United States looking for where the winds are fastest - often around lakes or just offshore - or check out the gallery to see snapshots of the wind from days past.

Fernanda discussed the development of Google+ Ripples, a visualization that shows how news spreads on Google+. The visualization shows spheres of influence and different patterns of spread. For example, someone might post a video to their Google+ page and if it goes viral, we’ll see several circles in the visualization. This depicts the influence of different individuals sharing content, both in terms of the number of their followers and the re-shares of the video, and has revealed that individuals are at times more influential than organizations in the social media domain.

Martin and Fernanda closed with two important lessons in data visualization: first, don’t “dumb down” the data. If complexity is handled correctly and in interesting ways, our users find the details appealing and find their own ways to interact with and expand upon the data. Second, users like to see their personal world in a visualization. Being able to see the spread of a Google+ post, or zoom in to see the wind around one’s town is what makes a visualization personal and compelling-- we call this the “I can see my house from here” feature.

The Faculty Summit will continue through Friday, July 27 with talks by Googlers and faculty guests as well as breakout sessions on specific topics related to this year’s theme of digital interactions. We will be looking closely at how computation and bits have permeated our everyday experiences via smart phones, wearable computing, social interactions, and education.

We will be posting here throughout the summit with updates and news as it happens.

Posted in Faculty Summit, Google+, University Relations, Visualization | No comments

Wednesday, 25 July 2012

Site Reliability Engineers: “solving the most interesting problems”

Posted on 10:00 by Unknown

Posted by Chris Reid, Sydney Staffing team

I recently sat down with Ben Appleton, a Senior Staff Software Engineer, to talk about his recent move from Software Engineer (SWE) on the Maps team to Site Reliability Engineering (SRE). In the interview, Ben explains why he transitioned from a pure development role to a role in production, and how his work has changed:

Chris: Tell us about your path to Google.
Ben: Before I joined Google I didn’t consider myself a “software engineer”. I went to the University of Queensland and graduated with a Bachelor’s Degree in Electrical Engineering and Mathematics, before going on to complete a Ph.D. My field of research was image segmentation, extending graph cuts to continuous space for analyzing X-rays and MRIs. At a conference in France I met a friend of my Ph.D. advisor’s, and he raved about Google, commenting that they were one of the only companies that really understood technology. I’d already decided academia wasn’t for me, so I interviewed for a general Software Engineering role at Google. I enjoyed the interviews, met some really smart people, and learned about some interesting stuff they were working on. I joined the Maps team in Sydney in 2005 and spent the next 6 years working on the Maps API.

Chris: Tell us about some of the coolest work you did for Google Maps, and how you applied your research background.
Ben: My background in algorithms and computational geometry was really useful. We were basically making browsers do stuff they’re not designed to do, such as rendering millions of vectors or warping images, inventing techniques as we went. On the server-side we focused on content distribution, pushing tiles or vectors from Google servers down through caches to the user’s browser, optimizing for load and latency at every stage. On the client-side, we had to make the most of limited processors with new geometric algorithms and clever prefetching to hide network latency. It was really interesting work.

Chris: I understand you received company-wide recognition when you were managing the Maps API team. Tell us more about what that entailed.
Ben: In September 2008, when managing the Maps API, my team received an award that was recognized Google-wide, which is a big honor. My main contributions were latency optimizations, stability, enterprise support, and Street View integration. The award was in recognition of strong sustained growth of the Maps API, in relation to the number of sites using it, and total views per day. Currently the Google Maps API is serving more than 600,000 websites.

Chris: So what prompted the move to Site Reliability Engineering (SRE)?
Ben: In my experience, a lot of software engineers don’t understand what SREs do. I’d worked closely with SREs, particularly those in Sydney supporting Maps, and had formed a high opinion of them. They’re a very strong team - they’re smart and they get things done. After 6 years working on the Maps API I felt it was time for a change. In Sydney there are SWE teams covering most of the product areas, including Chrome and Apps, Social and Blogger, Infrastructure Networking and the Go programming language, as well as Maps and GeoCommerce. I talked to all of them, but chose SRE because in my opinion, they’re solving the most interesting problems.

Chris: How would you describe SRE?
Ben: It really depends on the individual. At one end are the Systems Administrator types, sustaining ridiculously large systems. But at the other end are the Software Engineers like me. As SREs get more experienced this distinction tends to be blurred. The best SREs think programmatically even if they don’t do the programming. For me, I don’t see a difference in my day-to-day role. When I was working on the Maps API I was the primary on-call one week in three, whereas in SRE the typical on-call roster is one week in six. When you’re primary on-call it just means you’re the go-to person for the team, responsible for when something breaks or pushing new code into production. I was spending 50% of my time doing coding and development work, and as an SRE this has increased to 80%.

Chris: Wow! So as an SRE in Production, you’re spending less time on-call and more time writing code than you were as a SWE on the Maps team?
Ben: Yes! I’m not managing a team now, but I’m definitely spending more time coding than I was before. I guess the average SRE spends 50% of their time doing development work, but as I said, it depends on the person and it ranges from 20-80%.

Chris: What does your team do?
Ben: In Sydney there are SRE teams supporting Maps, Blogger, App Engine, as well as various parts of the infrastructure and storage systems. I’m working on Blobstore, an infrastructure storage service based on Bigtable which simplifies building and deploying applications that store users' binary data (BLOBs, or "Binary Large OBjects"). Example BLOBs include images, videos, or email attachments - any data objects that are immutable and long-lived. The fact that we're storing user data means that Blobstore must be highly available for reads and writes, be extremely reliable (so that we never lose data), and be efficient in terms of storage usage (so that we can provide large amounts of storage to users at low cost).

Chris: Tell us more about some of the problems you’re solving, and how they differ with those you faced as a SWE in a development role.
Ben: With the massive expansion in online data storage, we’re solving problems at a scale never before seen. Due to the global nature of our infrastructure, we think in terms of load balancing at many levels: across regions, across data centers within a region, and across machines within a data center. The problems we’re facing in SRE are much closer to the metal. We’re constantly optimizing resource allocation and efficiency and scalability of Google’s massive computer systems, as opposed to developing new features for a product like Maps. So the nature of the work is very similar to SWE, but the problems are bigger and there is a strong focus on scale.

Chris: Are you planning on staying in SRE for a while?
Ben: Yeah. I signed up for a six month rotation program called “Mission Control,” the goal of which is to teach engineers to understand the challenges of building and operating a high reliability service at Google scale. In other words, it’s an SRE training program. In my first three months of Mission Control I’ve been on-call twice, and always during office hours so there were SREs to help me when I got into trouble...which I did. I’ve got no intention of going back to SWE at the end of the six months and plan to stay in SRE for at least a few years. Right now the problems seem more interesting. For example, last year’s storage solutions are facing additional strain from the growth of Gmail, Google+ and Google Drive. So you’re constantly reinventing.

Chris: What advice do you have for Software Engineers contemplating a role in SRE?
Ben: SRE gives you the opportunity to work on infrastructure at a really big scale in a way you don’t get to in SWE. Whereas SWE is more about developing new features, SRE is dealing with bigger problems and more complex engineering due to the sheer scale. SRE is a great way to learn how systems really work in order to become a great engineer.

If you’re interesting in applying for a Site Reliability Engineering role, please note that we advertise the roles in several different ways to reflect the diversity of the team. The two main roles are “Software Engineer, Google.com” and “Systems Engineer, Google.com”. We use the term “Google.com” to signify that the roles are in Production as opposed to R&D. You can find all the openings listed on the Google jobs site. We’re currently hiring across many regions, including Sydney in Australia, and of course Mountain View in California.

Posted in Site Reliability Engineering | No comments

Friday, 13 July 2012

Google at SIGMOD/PODS 2012

Posted on 10:58 by Unknown

Posted by Anish Das Sarma, Research Scientist and Jeff Shute, Software Engineer

Over the years, SIGMOD has expanded beyond a traditional "database" conference to include several areas related to information management. This year’s ACM SIGMOD/PODS conference (on Management of Data, and Principles of Database Systems), held in Scottsdale, Arizona was no different. We were impressed by the wide variety of researchers from industry and academia alike the conference attracted, and enjoyed learning how others are pushing the limits of scalability in data storage and processing. In addition to an excellent set of papers on a large number of topics, we saw a couple of recurring themes:

1) Data Visualization

Pat Hanrahan from Stanford gave a keynote on some of the challenges involved in building systems to enable "data enthusiasts" to manage and visualize data.

Google’s Fusion Tables group also had a paper on this topic: Efficient Spatial Sampling of Large Geographical Tables, by Anish Das Sarma, Hongrae Lee, Hector Gonzalez, Jayant Madhavan, Alon Halevy. (This paper has been invited to a TODS special issue on best papers of SIGMOD 2012).

A similar effort from the University of Washington was presented as a demo: VizDeck: Self-Organizing Dashboards for Visual Analytics, by Alicia Key, Bill Howe, Daniel Perry, Cecilia Aragon.

2) Big Data

As has been the case for the last couple of years, “Big Data" has been of ever-growing interest to the entire community, particularly from industry. Google presented a talk on F1, a new distributed database system we’ve built to power the AdWords system. A complex business application like AdWords has different requirements than many systems at Google that often use storage systems like Bigtable. We have a single database shared by hundreds of developers and systems, so we need the robustness and ease of use we’re used to from traditional databases. F1 is built to scale like Bigtable, without giving up the database features we also need, like strong consistency, ACID transactions, schema enforcement, and most importantly, SQL query.

There’s been a widespread trend over the last several years away from databases, towards highly scalable “NoSQL” systems. We don’t think that trade-off is necessary, and were happy to see several other speakers advocate a similar theme -- yes, databases are useful, and developers shouldn’t need to give up database features and ease of use in the name of scalability.

This theme was supported by an industry session on Big Data featuring talks from other companies: Facebook (TAO: How Facebook Serves the Social Graph), Twitter (Large-Scale Machine Learning at Twitter), and Microsoft (Recurring Job Optimization in Scope). Googler Kirsten LeFevre was a panelist on the "Perspectives on Big Data" panel organized by Surajit Chaudhuri from Microsoft, and also featuring Donald Kossmann from ETHZ, Sam Madden from MIT, and Anand Rajaraman from Walmart Labs. Last but not the least, Surajit Chaudhuri also gave an excellent keynote outlining some of the research challenges that the new era of "Big Data and Cloud" poses.

As has been the practice for several years now, to continue generating great interest in data management research, SIGMOD has been organizing panels such as this year's "New Research Symposium" (which included Anish Das Sarma from Google as a panelist).

In addition to sponsoring the conference, many Googlers attended contributing to a robust presence and affording us the opportunity to interact with the broader information management community. We've been pushing the frontiers of science with cutting-edge research in many aspects of data management, and we were eager to share our innovations and see what others have been working on. We found Amin Vahdat's keynote on the intersection of Networking and Databases to be a highlight of Google’s participation, which also included presenting papers, participating on panels, and taking part in planning and program committees:

Program Committee Members

Anish Das Sarma, Venkatesh Ganti, Zoltan Gyongyi, Alon Halevy (Tutorials Chair), Kristen LeFevre, Cong Yu

Talks

Symbiosis in Scale Out Networking and Data Management
Amin Vahdat, Google (Keynote)

F1-The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business
Jeff Shute, Mircea Oancea, Stephan Ellner, Ben Handy, Eric Rollins, Bart Samwel, Radek Vingralek, Chad Whipkey, Xin Chen, Beat Jegerlehner, Kyle Littlefield, Phoenix Tong (Googlers)

Finding Related Tables
Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Cong Yu (Googlers)

Papers

CloudRAMSort: Fast and Efficient Large-Scale Distributed RAM Sort on Shared-Nothing Cluster
Changkyu Kim, Jongsoo Park, Nadathur Satish, Hongrae Lee (Google), Pradeep Dubey, Jatin Chhugani

Efficient Spatial Sampling of Large Geographical Tables
Anish Das Sarma, Hongrae Lee, Hector Gonzalez, Jayant Madhavan, Alon Halevy (Googlers)

Panels

Perspectives on Big Data Plenary Session: Privacy and Big Data
Kristen LeFevre, Google

SIGMOD New Researcher Symposium - How to be a good advisor/advisee?
Anish Das Sarma, Google

Overall, this year’s SIGMOD was a great conference, widely attended by researchers from industry and academia, and comprised of a very interesting mix of research presentations and discussions. Google had a good showing at the conference, and we look forward to continuing this trend in the coming years.

Posted in conference, SIGMOD, Structured Data | No comments