April 2013 ~ Compact System

Thursday, 25 April 2013

Two Googlers elected to the American Academy of Arts and Sciences

Posted on 13:52 by Unknown

Posted by Alfred Spector, Vice President, Engineering

Cross-posted with the Official Google Blog

On Wednesday, the American Academy of Arts and Sciences announced its list of 2013 elected members. We’re proud to congratulate Peter Norvig, director of research, and Arun Majumdar, vice president for energy; two Googlers who are among the new members elected this year.

Membership in the American Academy of Arts and Sciences is considered one of the nation’s highest honors, with those elected recognized as leaders in the arts, public affairs, business, and academic disciplines. With more than 250 Nobel Prize laureates and 60 Pulitzer Prize winners among its fellows, the American Academy celebrates the exceptional contributions of the elected members to critical social and intellectual issues.

With their election, Peter and Arun join six other Googlers as American Academy members: Eric Schmidt, Vint Cerf, Alfred Spector, Hal Varian, Ray Kurzweil, and founders Sergey Brin and Larry Page, all of whom embody our commitment to innovation and real-world impact. You can read more detailed summaries of Peter and Arun’s achievements below.

Dr. Peter Norvig, currently director of research at Google, is known most for his broad expertise in computer science and artificial intelligence, exemplified by his co-authorship (with Stuart Russell) of the leading college text, Artificial Intelligence: A Modern Approach. With more than 50 publications and a plethora of webpages, essays and software programs on a wide variety of CS topics, Peter is a catalyst of fundamental research across a wide range of disciplines while remaining a hands-on scientist who writes his own code. Recently, he has taught courses on artificial intelligence and the design of computer programs via massively open online courses (MOOC). Learn more about Peter and his research on norvig.com.

Dr. Arun Majumdar leads Google.org’s energy initiatives and advises Google on its broader energy strategy. Prior to joining Google last year, he was the founding director of the U.S. Department of Energy's Advanced Research Projects Agency-Energy (ARPA-E), where he served from October 2009 until June 2012. Earlier, he was a professor of mechanical engineering as well as materials science and engineering at the University of California, Berkeley, and headed the Environmental Energy Technologies Division at the Lawrence Berkeley National Laboratory. He has published several hundred papers, patents, and conference proceedings. Find out more about Arun.

Posted in | No comments

Thursday, 11 April 2013

50,000 Lessons on How to Read: a Relation Extraction Corpus

Posted on 09:00 by Unknown

Posted by Dave Orr, Product Manager, Google Research

One of the most difficult tasks in NLP is called relation extraction. It’s an example of information extraction, one of the goals of natural language understanding. A relation is a semantic connection between (at least) two entities. For instance, you could say that Jim Henson was in a spouse relation with Jane Henson (and in a creator relation with many beloved characters and shows).

The goal of relation extraction is to learn relations from unstructured natural language text. The relations can be used to answer questions (“Who created Kermit?”), learn which proteins interact in the biomedical literature, or to build a database of hundreds of millions of entities and billions of relations to try and help people explore the world’s information.

To help researchers investigate relation extraction, we’re releasing a human-judged dataset of two relations about public figures on Wikipedia: nearly 10,000 examples of “place of birth”, and over 40,000 examples of “attended or graduated from an institution”. Each of these was judged by at least 5 raters, and can be used to train or evaluate relation extraction systems. We also plan to release more relations of new types in the coming months.

Each relation is in the form of a triple: the relation in question, called a predicate; the subject of the relation; and the object of the relation. In the relation “Stephen Hawking graduated from Oxford,” Stephen Hawking is the subject, graduated from is the relation, and Oxford University is the object. Subjects and objects are represented by their Freebase MID’s, and the relation is defined as a Freebase property. So in this case, the triple would be represented as:

"pred":"/education/education/institution"
"sub":"/m/01tdnyh"
"obj":"/m/07tgn"

Just having the triples is interesting enough if you want a database of entities and relations, but doesn’t make much progress towards training or evaluation a relation extraction system. So we’ve also included the evidence for the relation, in the form of a URL and an excerpt from the web page that our raters judged. We’re also including examples where the evidence does not support the relation, so you have negative examples for use in training better extraction systems. Finally, we included ID’s and actual judgments of individual raters, so that you can filter triples by agreement.

Gory Details

The corpus itself, extracted from Wikipedia, can be found here: https://code.google.com/p/relation-extraction-corpus/

The files are in JSON format. Each line is a triple with the following fields:

pred: predicate of a triple
sub: subject of a triple
obj: object of a triple
evidences: an array of evidences for this triple

url: the web page from which this evidence was obtained
snippet: short piece of text supporting the triple

judgments: an array of judgements from human annotators

rator: hash code of the identity of the annotator
judgment: judgement of the annotator. It can take the values "yes" or "no"

Here’s an example:

{"pred":"/people/person/place_of_birth","sub":"/m/026_tl9","obj":"/m/02_286","evidences":[{"url":"http://en.wikipedia.org/wiki/Morris_S._Miller","snippet":"Morris Smith Miller (July 31, 1779 -- November 16, 1824) was a United States Representative from New York. Born in New York City, he graduated from Union College in Schenectady in 1798. He studied law and was admitted to the bar. Miller served as private secretary to Governor Jay, and subsequently, in 1806, commenced the practice of his profession in Utica. He was president of the village of Utica in 1808 and judge of the court of common pleas of Oneida County from 1810 until his death."}],"judgments":[{"rater":"11595942516201422884","judgment":"yes"},{"rater":"16169597761094238409","judgment":"yes"},{"rater":"1014448455121957356","judgment":"yes"},{"rater":"16651790297630307764","judgment":"yes"},{"rater":"1855142007844680025","judgment":"yes"}]}

The web is chock full of information, put there to be read and learned from. Our hope is that this corpus is a small step towards computational understanding of the wealth of relations to be found everywhere you look.

This dataset is licensed by Google Inc. under the Creative Commons Attribution-Sharealike 3.0 license.

Thanks to Shaohua Sun, Ni Lao, and Rahul Gupta for putting this dataset together.

Thanks also to Michael Ringgaard, Fernando Pereira, Amar Subramanya, Evgeniy Gabrilovich, and John Giannandrea for making this data release possible.

Posted in Natural Language Processing, Wiki | No comments

Tuesday, 9 April 2013

Advanced Power Searching with Google: Lessons Learned

Posted on 09:30 by Unknown

Posted by Dan Russell, Uber Tech Lead, Search Quality & User Happiness and Maggie Johnson, Director of Education and University Relations

Large classes are something you normally want to avoid like the plague. So the idea of being in a class with tens of thousands of students seems like a completely crazy idea.

But in January, 2013, Google offered a free “MOOC” (a Massive Open Online Course) to teach Advanced Power Searching (APS) to a wide variety of information professionals.

The wholly online class ran for two weeks covering advanced research skills in a challenge-based format. It also had a bit more than 35,000 students sign up for the class.

In this case, the large class size was a boon to the students. Not only was there a vigorous discussion of the material in the social media, but with a class this large, anytime you had a question, someone else in the class had almost certainly asked the same question and had an answer ready. As in many MOOCs, the large online class size did not stress any lecture hall capacities, but it did give the students the benefit of multicultural classmates that were effectively always present in the social spaces of the MOOC.

A typical Massive Open Online Course (MOOC) is a simple progression through a series of mini-lectures--usually a short video followed by reflective questions, problem sets and a few assessments. MOOCs can have huge numbers of students; dozens have been offered with over 150,000 students enrolled. Based on our experiments with Power Searching with Google in 2012, we wanted to do something different. When we offered Advanced Power Searching with Google (APS) in January of 2013, we decided to try out a number of new ideas.

Through this course, we wanted to enable our students to solve complex research questions using a variety of tools, such as Google Scholar, Patents, Books, Google+, etc.. We defined complex problems that had more than one right answer and more than one way to find those answers.

Unlike a traditional MOOC, the APS course had twelve challenges that students could tackle in any order they liked. There were four easy, four medium and four difficult challenges. Part of the design of the class was to have students discover the skills they’d need to solve the challenges and select appropriate video or text lessons. Students could also access case studies that showed how others solve similar problems.

We called our MOOC design “Choose your own adventure.” Each challenge presented a research question like this:

“You are in the city that is home to the House of Light. Nearby there is a museum in a converted school featuring paintings from the far-away Forest of Honey.

What traditional festival are you visiting?”

In this class, the large cohort of 35,000 students worked through the materials together, using online forums to ask questions as well as Google+ Hangouts to attend office hours and collaborate on solving challenges. Instructor Dan Russell and a group of teaching assistants monitored students’ activities and provided support as needed.

If they needed additional help, students could post a question on the forum or see how others solved the challenge. Students could post their solutions to challenges in a special “Peer explanations” section; a feature that many students appreciated as it let them see how others in the class approached the problem in their own ways.

In analyzing the data, we found that there were a decreasing number of views on each challenge page, indicating that students most likely tried the challenges in the order given. While some liked the ability to jump around, most tended to go through the content linearly. Most students who completed the course tried (or at least looked at) all twelve challenges. Many students who did not complete the course tried three or fewer challenges.

To earn a certificate of completion, students submitted two detailed case studies of how they solved a complex search challenge. Students provided great examples of how they used Google tools to research their family’s history, the origins of common objects, or trips they anticipate taking. In addition to listing their queries, they wrote details about how they knew websites were credible and what they learned along the way.

To assess their work, we experimented with letting the students grade their assignments based on a rubric. We collected their scores and compared them with a random sample of assignments graded by TAs. There was a moderate yet statistically significant correlation (r=0.44) between student scores and TA scores. In fact, the majority of students graded themselves within two points of how an expert grader assessed their work. This is a positive result since it suggests that self-graded project work in a MOOC can be valuable as a source of insight into student performance.

The challenge format seemed to be effective and motivating for a small, dedicated population of students. We had 35,000 registrants for this advanced course, and 12% earned a certificate of completion. This rate is somewhat lower than what we saw for Power Searching with Google, a more traditional MOOC. Students who did not complete the course reported a lack of time, and difficulty of the content as barriers.

One interesting point was that labeling the challenges as easy, medium or difficult likely had an unintentional effect. The first challenge was marked as “easy,” but many people found it difficult. This may have de-motivated students from attempting more difficult challenges. Next time, we plan to ask students if the first challenge was too easy, or too challenging, and then send them to a challenge at an appropriate level of difficulty.

Watch for more MOOCs on our products and services in the coming months. And watch for more experimentation as we apply what we have learned, and try more ideas and new approaches in future online courses.

Posted in Education, MOOC | No comments

Compact System

Thursday, 25 April 2013

Two Googlers elected to the American Academy of Arts and Sciences

Thursday, 11 April 2013

50,000 Lessons on How to Read: a Relation Extraction Corpus

Tuesday, 9 April 2013

Advanced Power Searching with Google: Lessons Learned

Popular Posts

Categories

Blog Archive

About Me