设为首页 收藏本站
查看: 525|回复: 0

[经验分享] TO BE A ZOOKEEPER

[复制链接]

尚未签到

发表于 2019-1-8 08:07:26 | 显示全部楼层 |阅读模式
  This article is about some specific things I learned about Apache ZooKeeper. Apache ZooKeeper is useful if you are writing a distributed application and need coordination between processes for things like configuration, group membership, leader election, locking and the like.
  ZooKeeper has quite good documentation, and I have provided pointers where applicable, but maybe writing from my own experience can help shorten someone else’s learning curve.
Connection loss and session expiration
  A central issue you need to think about when implementing ZooKeeper is how to deal with connection loss and session expiration. Since ZooKeeper is the central coordination service, once an application loses its connection with ZooKeeper it is left in an uncertain situation.
  To survive temporary network hickups, or the death of the particular ZooKeeper server you are connected to, ZooKeeper has the notion of a session timeout. If you are able to reconnect to one of the servers in the ZooKeeper ensemble within this timeout, the ephemeral nodes and watches created by the application will not be lost. Various reasons for connection problems are described in the Troubleshooting page on the wiki.
  ZooKeeper reports the loss of connection to the application as soon as it happens, through a Disconnected event sent to all watchers. Once the connection is reestablished, it reports a SyncConnected event. Even if the connection would only be lost for a very small amount of time, you will get these two events.
  If an application only succeeds to reconnect to a ZooKeeper server after the session timeout passed, it will get an Expired event. The important thing to note here is that this event is produced by the server. If the ZooKeeper client cannot connect to the server for any amount of time, even if it is much longer than the session timeout (like hours), your application will not get any event besides the initial Disconnected event. At first I did not understand why the ZooKeeper client cannot produce this event locally, if it has not been able to reestablish the connection within the session timeout. Turns out there is a reason for it (same here).

  How to deal with connection loss is pretty much a decision to be made for each application. If the application is just some sort of ‘worker’ it could pause until the connection is back. If the application is elected as leader, can it continue to assume it is the leader when it is not longer connect to ZooKeeper? See this post on active & passive leaders and also this one for some guidelines. If the application is a server, should it continue to accept client requests knowing it might be partitioned from other servers or not have the latest configuration information from ZooKeeper? These are the sorts of things you need to decide on. Here is another post suggesting some best practices. Once you delve into the mailing list archives you will find plenty of discussions>  Session expiration is less common than connection loss and harder to recover from, as you need to handle reconnection yourself by creating a new ZooKeeper handle. An easy solution is to simply restart the application at this point.
Dealing with ConnectionLoss exceptions
  When connection loss occurs, any running Zookeeper operation (like getChildren, get/setData, create, delete) will throw a ConnectionLoss exception. Somehow you have to deal with these. One way is to retry the operation until it succeeds (= until the connection is back). For example in the lock recipe found in the ZooKeeper source tree, they use this pattern, see the ProtocolSupport.retryOperation method. This technique is also present in the ZKClient library.
  When an operation throws a ConnectionLoss exception, you are not sure if it succeeded or not. The connection might have failed before or after it got through to the server. So you have to consider this when retrying the operation. A create might fail with a NodeExists exception, a delete might fail with a NoNode exception.
  Suppose you implement a ZooKeeper-based lock using the creation of ephemeral sequence nodes. If you retry the operation when it fails, you could have created two ephemeral nodes, but you will not know the name of the first. You can solve this by looping over the znodes and checking the Stat.ephemeralOwner field. If you would not retry the operation, and e.g. simply exit the lock-taking procedure, you might have left an epehemeral node and hence make it impossible for others to obtain the lock.
  The ErrorHandling page on the wiki contains some more detail on this topic.
Storing multiple properties in the data of one node
  Given ZooKeeper’s tree model, it might be tempting to store an entity as a znode with subnodes for each of the properties of the entity. This allows for fine-grained reading, writing, and watching.
  But with this approach you cannot create the entity atomically. As a consequence, clients can see it in a partially-created, partially-updated or partially-deleted state. If you want to watch the complete state of the entity, you would need to install watchers on all of the individual nodes. So at second sight, this becomes quite complex.
  Therefore, I found it easier to store certain configuration entities in their entirety in the data of one node. In my case, I stored it as JSON.
  Here is a mail about this topic.
  If you are in a situation where storing it all in one znode is not desirable, another technique is the one of the ready znode described in theZooKeeper paper.
The single event thread
  Each ZooKeeper handle (= ZooKeeper object instance) has one thread for dispatching the events to all the watchers. So watchers are called in sequence. If you perform some time consuming action in a watcher, all the other watchers will have to wait before they get a chance to process their event.

  Thus: do not do anything time-consuming in a watcher, such as IO, waiting on a lock, etc. One case to watch out for is not to do something which in itself might again wait for a ZooKeeper event. I made for example this mistake trying to take a ZooKeeper-based lock (of which the implementation details where in another>  If you use multiple ZooKeeper handles in your application, each has its own event dispatching thread, so you only have to worry about this for all the users on one particular ZooKeeper handle. It seems common practice to have just one ZooKeeper handle though.
  The event thread is also described in the Programmer’s Guide.
Registering the same watcher object multiple times for the same event
  If you register the same watcher object (= the exact same instance) for the same event multiple times, it will be called only once. Thus if you do:
Watcher w = ...  
zooKeeper.getChildren(“/foo”, w);
  
zooKeeper.getChildren(“/foo”, w);
  The watcher w will be called only once when the children change. See the Programmer’s Guide where it mentions “A watch object, or function/context pair, will only be triggered once for a given notification.”.

  I found this very handy in the following situation. Suppose you install a watch, it gets triggered, and in the Watcher callback you reinstall the same watch. Since watchers are one-time triggers this is a common thing to do. Now it could be that reinstalling the watch fails due to a ConnectionLoss exception. As mentioned earlier, if you get a ConnectionLoss exception, you do not know if the operation succeeded, so you do not know if installing the watch succeeded. And since you are in a watcher callback, doing something blocking like retrying the operation until it succeeds is not a good>  Therefore, in such case you can simply install the watcher everytime you get a SyncConnected event, without worrying that the watcher will then be called twice when an event occurs.
InterruptedException
  Many ZooKeeper’s API methods throw InterruptedException. This is a checked exception, so you are forced to do something with it. This might seem annoying at first, but InterruptedException is actually a very useful exception: when a method throws InterruptedException, it tells you there is a possibility to interrupt it.
  What is the best way to handle InterruptedException? Well, there is a nice article about this by Brian Goetz. Basically, you should make sure not to stop the interruption of the thread. Either by throwing on the InterruptedException, or if this is not possible, by re-enabling the interrupted flag (call Thread.currentThread().interrupt).
Miscellaneous

  •   To locally test your application against a ZooKeeper ensemble, for example to test how it reacts to session moves, be sure to check out zkconf: it enables to set up an ensemble in seconds. Do not forget to use odd numbers of servers (see ZooKeeper admin guide).
  •   As developer, be sure to also read the ZooKeeper Administrator’s Guide. There is useful information in there such as the “four letter word” commands you can use to query some information and metrics. Also check out zktop.



运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-660536-1-1.html 上篇帖子: ZooKeeper基本讲解 & 集群构建 & 常用操作指令 下篇帖子: zookeeper简单的集群配置!
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表