参考:
Zookeeper的一致性协议:Zab
Chubby&Zookeeper原理及在分布式环境中的应用
Paxos vs. Viewstamped Replication vs. Zab
Zab vs. Paxos
Zab: High-performance broadcast for primary-backup systems
Chubby:面向松散耦合的分布式系统的锁服务
Chubby 和Zookeeper 的理解
zookeeper 的选主过程使用paxos,数据复制使用Zab(zookeeper atom broadcast).
Zab是为主备系统设计,集群机器越多,写性能会有所降低、读性能得到水平扩展。一致性的前提都有一定的限定条件,一般从Follower直接读取数据,虽不保证最新,但在其应用领域配置、分布式事务等业务上看已经是强一致性了。zookeeper 并不适合需要大量写操作的系统。
具体下面描述更加清楚: Zab is a different protocol than Paxos, although it shares with it some key aspects, as for example:
A leader proposes values to the followers
Leaders wait for acknowledgements from a quorum of followers before considering a proposal committed (learned)
Proposals include epoch numbers, which are similar to ballot numbers in Paxos
The main conceptual difference between Zab and Paxos is that it is primarily designed for primary-backup systems, like Zookeeper, rather than for state machine replication. Paxos can be used for primary-backup replication by letting the primary be the leader. The problem with Paxos is that, if a primary concurrentlyproposes multiple state updates and fails, the new primary may apply uncommitted updates in an incorrect order. An example is presented in our DSN 2011 paper (Figure 1). In the example, a replica should only apply the state update B after applying A. The example shows that, using Paxos, a new primary and its follows may apply B after C, reaching an incorrect state that has not been reached by any of the previous primaries. A workaround to this problem using Paxos is to sequentially agree on state updates: a primary proposes a state update only after it commits all previous state updates. Since there is at most one uncommitted update at a time, a new primary cannot incorrectly reorder updates. This approach, however, results in poor performance. Zab does not need this workaround. Zab replicas can concurrently agree on the order of multiple state updates without harming correctness. This is achieved by adding one more synchronization phase during recovery compared to Paxos, and by using a different numbering of instances based on zxids.