Compact System

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Friday, 25 February 2011

Where does my data live?

Posted on 14:45 by Unknown
Posted by Daniel Ford, Senior Mathematician

Have you ever wondered what happens when you upload a photo to Picasa, or where all your Gmail or YouTube videos are stored? How it is that you can read or watch them from anywhere at any time?

If you stored your data on a single hard disk, like the one in your personal computer, then the disk would eventually fail and your data would be lost forever. If you want to protect your data from the possibility of such a failure, you can store copies across many different disks so that if any one fails then you just access the data from another.

However, once storage systems get large enough, anything and everything can and does go wrong. You have to plan not just for disk failures but for server, network, and entire datacenter failures. Add to this software bugs and maintenance operations and you have a whole lot more failures.

Using measurements from dozens of Google data centers, we found that almost-simultaneous failure of many servers in a data center has the greatest impact on availability. On the other hand, disk failures have relatively little impact because our systems are specifically designed to cope with these failures.

Once you have a model of failures, you can also look at the impact of various design choices. Where exactly should you place your data replicas? How fast do you need recover from losing a disk or server? What encoding scheme or number of replicas of the data is enough, given a desired level of availability? For example, we found that storing data across multiple data centers reduces data unavailability by many orders of magnitude compared to having the same number of replicas in a single data center. The added complexity and potential for slower recovery times is worth it to get better availability, or use less storage space, or even both at the same time.

As you can see, something as simple as storing your photos, mail, or videos becomes a lot more involved when you want to be sure it's always available.

In our paper, Availability in Globally Distributed Storage Systems, we characterize the availability of cloud storage systems, based on extensive monitoring of Google's main storage infrastructure, and the sources of failure which affect availability. We also present statistical models for reasoning about the impact of design choices such as data placement, recovery speed, and replication strategies, including replication across multiple data centers.
Email ThisBlogThis!Share to XShare to Facebook
Posted in Publications | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • CDC Birth Vital Statistics in BigQuery
    Posted by Dan Vanderkam, Software Engineer Google’s BigQuery Service lets enterprises and developers crunch large-scale data sets quickly...
  • Towards Energy-Proportional Datacenters
    Posted by Dennis Abts, Michael R. Marty, Philip M. Wells, Peter Klausler, and Hong Liu This is part of the series highlighting some notable...
  • Site Reliability Engineers: “solving the most interesting problems”
    Posted by Chris Reid, Sydney Staffing team I recently sat down with Ben Appleton, a Senior Staff Software Engineer, to talk about his recent...
  • Our Faculty Institute brings faculty back to the drawing board
    Posted by Nina Kim Schultz, Google Education Research Cross-posted with the Official Google Blog School may still be out for summer, but tea...
  • Market Algorithms and Optimization Meeting
    Posted by  Vahab S. Mirrokni and Muthu Muthukrishnan Google auctions ads, and enables a market with millions of advertisers and users.  This...
  • Our Unique Approach to Research
    Posted by  Alfred Spector , Vice President of Research and Special Initiatives Google started as a research project —and research has remain...
  • Impact of Organic Ranking on Ad Click Incrementality
    Posted by David Chan, Statistician and Lizzy Van Alstine, Research Evangelist  In 2011, Google released a Search Ads Pause research study w...
  • Large-scale graph computing at Google
    Posted by Grzegorz Czajkowski, Systems Infrastructure Team If you squint the right way, you will notice that graphs are everywhere. For exam...
  • Continuing the quest for future computer scientists with CS4HS
    Erin Mindell, Program Manager, Google Education Computer Science for High School (CS4HS) began five years ago with a simple question: How c...
  • Millions of Core-Hours Awarded to Science
    Posted by Andrea Held, Program Manager, University Relations In 2011 Google University Relations launched a new academic research awards pr...

Categories

  • accessibility
  • ACL
  • ACM
  • Acoustic Modeling
  • ads
  • adsense
  • adwords
  • Africa
  • Android
  • API
  • App Engine
  • App Inventor
  • Audio
  • Awards
  • Cantonese
  • China
  • Computer Science
  • conference
  • conferences
  • correlate
  • crowd-sourcing
  • CVPR
  • datasets
  • Deep Learning
  • distributed systems
  • Earth Engine
  • economics
  • Education
  • Electronic Commerce and Algorithms
  • EMEA
  • EMNLP
  • entities
  • Exacycle
  • Faculty Institute
  • Faculty Summit
  • Fusion Tables
  • gamification
  • Google Books
  • Google+
  • Government
  • grants
  • HCI
  • Image Annotation
  • Information Retrieval
  • internationalization
  • Interspeech
  • jsm
  • jsm2011
  • K-12
  • Korean
  • Labs
  • localization
  • Machine Hearing
  • Machine Learning
  • Machine Translation
  • MapReduce
  • market algorithms
  • Market Research
  • ML
  • MOOC
  • NAACL
  • Natural Language Processing
  • Networks
  • Ngram
  • NIPS
  • NLP
  • open source
  • operating systems
  • osdi
  • osdi10
  • patents
  • ph.d. fellowship
  • PiLab
  • Policy
  • Public Data Explorer
  • publication
  • Publications
  • renewable energy
  • Research Awards
  • resource optimization
  • Search
  • search ads
  • Security and Privacy
  • SIGMOD
  • Site Reliability Engineering
  • Speech
  • statistics
  • Structured Data
  • Systems
  • Translate
  • trends
  • TV
  • UI
  • University Relations
  • UNIX
  • User Experience
  • video
  • Vision Research
  • Visiting Faculty
  • Visualization
  • Voice Search
  • Wiki
  • wikipedia
  • WWW
  • YouTube

Blog Archive

  • ►  2013 (51)
    • ►  December (3)
    • ►  November (9)
    • ►  October (2)
    • ►  September (5)
    • ►  August (2)
    • ►  July (6)
    • ►  June (7)
    • ►  May (5)
    • ►  April (3)
    • ►  March (4)
    • ►  February (4)
    • ►  January (1)
  • ►  2012 (59)
    • ►  December (4)
    • ►  October (4)
    • ►  September (3)
    • ►  August (9)
    • ►  July (9)
    • ►  June (7)
    • ►  May (7)
    • ►  April (2)
    • ►  March (7)
    • ►  February (3)
    • ►  January (4)
  • ▼  2011 (51)
    • ►  December (5)
    • ►  November (2)
    • ►  September (3)
    • ►  August (4)
    • ►  July (9)
    • ►  June (6)
    • ►  May (4)
    • ►  April (4)
    • ►  March (5)
    • ▼  February (5)
      • Slicing and dicing data for interactive visualization
      • Where does my data live?
      • A Runtime Solution for Online Contention Detection...
      • Congratulations to Ken Thompson
      • Query Language Modeling for Voice Search
    • ►  January (4)
  • ►  2010 (44)
    • ►  December (7)
    • ►  November (2)
    • ►  October (9)
    • ►  September (7)
    • ►  August (2)
    • ►  July (7)
    • ►  June (3)
    • ►  May (2)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (2)
  • ►  2009 (44)
    • ►  December (8)
    • ►  November (4)
    • ►  August (4)
    • ►  July (5)
    • ►  June (5)
    • ►  May (4)
    • ►  April (6)
    • ►  March (3)
    • ►  February (1)
    • ►  January (4)
  • ►  2008 (11)
    • ►  December (1)
    • ►  November (1)
    • ►  October (1)
    • ►  September (1)
    • ►  July (1)
    • ►  May (3)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
  • ►  2007 (9)
    • ►  October (1)
    • ►  September (2)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  February (2)
  • ►  2006 (15)
    • ►  December (1)
    • ►  November (1)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  April (3)
    • ►  March (4)
    • ►  February (1)
Powered by Blogger.

About Me

Unknown
View my complete profile