数据模型,层次化的名称空间
Data model and the hierarchical namespace
ZooKeeper提供的名称空间非常类似一个标准的文件系统。每一个“名字”就像一个地址,由斜线(/)分隔的地址元素字符串组成的。ZooKeeper名字空间中的每一个znode都由一个地址来唯一标识。除了根结点(/)外,每一个znode结点有一个父亲结点,父亲结点的地址是儿子结点地址的前缀(少了一层地址元素)。同时和标准文件系统很像的一点是:如果一个znode结点还有儿子结点,它是不能被删除的。
层次化的ZooKeeper名称空间 结点 & 暂时结点
Nodes and ephemeral nodes
及时性Timeliness – 客户端看到的ZooKeeper服务视图,能够保证及时更新within a certain time bound.
更多的信息,如何使用这些特性,请参考[tbd] 简单的API
Simple API
ZooKeeper的一个设计目标,就是要提供一个非常简单的编程接口,所以,我们只提供以下操作:
create
创建一个给定地址的结点
delete
删除一个结点
exists
测试返回给定地址的结点是否存在
get data
读取结点的数据(字节数组)
set data
将数据写入结点(覆盖替换,非追加)
get children
返回一个结点的所有子结点
sync
waits for data to be propagated
更深入的讨论、要知道如何使用这些操作实现一个更高层级的操作,请参考[tbd] 实现
Implementation
所下《组件图》展示了ZooKeeper服务的高层组件。除“请求处理器(request processor)”,ZooKeeper服务中的每个服务器都要复制自己的每个组件的副本(With the exception of the request processor, each of the servers that make up the ZooKeeper service replicates its own copy of each of components.)
Uses
The programming interface to ZooKeeper is deliberately simple. With it, however, you can implement higher order operations, such as synchronizations primitives, group membership, ownership, etc. Some distributed applications have used it to: [tbd: add uses from white paper and video presentation.] For more information, see [tbd] Performance
ZooKeeper is designed to be highly performant. But is it? The results of the ZooKeeper’s development team at Yahoo! Research indicate that it is. (See ZooKeeper Throughput as the Read-Write Ratio Varies.) It is especially high performance in applications where reads outnumber writes, since writes involve synchronizing the state of all servers. (Reads outnumbering writes is typically the case for a coordination service.)
ZooKeeper Throughput as the Read-Write Ratio Varies
The figure ZooKeeper Throughput as the Read-Write Ratio Varies is a throughput graph of ZooKeeper release 3.2 running on servers with dual 2Ghz Xeon and two SATA 15K RPM drives. One drive was used as a dedicated ZooKeeper log device. The snapshots were written to the OS drive. Write requests were 1K writes and the reads were 1K reads. “Servers” indicate the size of the ZooKeeper ensemble, the number of servers that make up the service. Approximately 30 other servers were used to simulate the clients. The ZooKeeper ensemble was configured such that leaders do not allow connections from clients.
Note
In version 3.2 r/w performance improved by ~2x compared to the previous 3.1 release.
Benchmarks also indicate that it is reliable, too. Reliability in the Presence of Errors shows how a deployment responds to various failures. The events marked in the figure are the following:
Failure and recovery of a follower
Failure and recovery of a different follower
Failure of the leader
Failure and recovery of two followers
Failure of another leader
Reliability
To show the behavior of the system over time as failures are injected we ran a ZooKeeper service made up of 7 machines. We ran the same saturation benchmark as before, but this time we kept the write percentage at a constant 30%, which is a conservative ratio of our expected workloads.
Reliability in the Presence of Errors
The are a few important observations from this graph. First, if followers fail and recover quickly, then ZooKeeper is able to sustain a high throughput despite the failure. But maybe more importantly, the leader election algorithm allows for the system to recover fast enough to prevent throughput from dropping substantially. In our observations, ZooKeeper takes less than 200ms to elect a new leader. Third, as followers recover, ZooKeeper is able to raise throughput again once they start processing requests.
The ZooKeeper Project
ZooKeeper has been successfully used in many industrial applications. It is used at Yahoo! as the coordination and failure recovery service for Yahoo! Message Broker, which is a highly scalable publish-subscribe system managing thousands of topics for replication and data delivery. It is used by the Fetching Service for Yahoo! crawler, where it also manages failure recovery. A number of Yahoo! advertising systems also use ZooKeeper to implement reliable services.
All users and developers are encouraged to join the community and contribute their expertise. See the Zookeeper Project on Apache for more information.