State-Machine Replication for Planetary-Scale Systems
February 27, 2019 at 1:30pm
Modern Internet services often replicate critical data across multiple geographical locations and maintain its consistency using state-machine replication (SMR). Unfortunately, classical SMR protocols have limited scalability in a geo-distributed setting due to their use of a leader replica or large quorums. We present the first SMR protocol, called Atlas, that scales with the number of replicas. The key observation used in the protocol is that data center failures in geo-distributed systems are rare. Hence, our protocol allows choosing the maximum number of failures independently of the overall number of replicas, and is optimized for small numbers of the former. To achieve scalability, Atlas does not rely on a distinguished leader and uses quorums that get smaller when so does the bound on failures. We experimentally demonstrate the benefits of our protocol by evaluating it with up to 17 geographical sites spread across the world.
Alexey Gotsman is an Associate Research Professor at the IMDEA Software Institute in Madrid, Spain. He obtained his PhD from the University of Cambridge, UK. Alexey’s interests are at the intersection of distributed systems and formal verification. He has received best paper awards at PODC, DISC and CONCUR and is currently a holder of an ERC Starting Grant.