我通过试验观察到 Zookeeper的集群环境最好有3台以上的节点,如果只有2台,那么2台当中不管那台机器down掉,将只会剩下一个leader,那么如果有再有客户端连接上来,将无法工作,并且剩下的leader服务器会不断的抛出异常。内容如下:
java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:347)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:381)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:674)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:611)
2010-11-15 00:31:52,031 – INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@683] – Notification time out: 12800
并且客户端连接时还会抛出这样的异常,说明连接被拒绝,并且等待一个socket连接新的连接,这里socket新的连接指的是zookeeper中的一个Follower。
org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1000) Opening socket connection to server 192.168.50.211/192.168.50.211:2181
org.apache.zookeeper.ClientCnxn$SendThread.primeConnection(ClientCnxn.java:908) Socket connection established to 192.168.50.211/192.168.50.211:2181, initiating session
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1118) Unable to read additional data from server sessionid 0×0, likely server has closed socket, closing socket connection and attempting reconnect
org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1000) Opening socket connection to server localhost/127.0.0.1:2181
2010-11-15 13:31:56,626 WARN org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1120) Session 0×0 for server null, unexpected error, closing socket connection and attempting reconnect
先写到这里,全面了解Zookeeper不太容易,光看Apache Zookeeper官方的wiki和文档还不能对其有深入的了解,想阅读Zookeeper中的部分源代码。另外,本人目前正在学习和了解ing,欢迎大家与我交流,谢谢。
原文:http://www.javabloger.com/article/apache-zookeeper-hadoop.html