g87616758 发表于 2017-4-19 09:45:19

HBase RegionServer 退出 ( ZooKeeper session expired)

  RegionServer 由于 ZooKeeper session expired 而退出,头疼了很久,总结可能的原因:
  1、网络不好
  2、GC时间过长,程序暂停导致租约过期
  3、CPU忙,维护zookeeper的线程不能及时得到执行机会(调度)
  解决方案:


[*]RS配置zookeeper.session.timeout时间长点,我配置的180000
[*]RS配置hbase.regionserver.restart.on.zk.expire设置为true
  参考下源代码

/**
* We register ourselves as a watcher on the master address ZNode. This is
* called by ZooKeeper when we get an event on that ZNode. When this method
* is called it means either our master has died, or a new one has come up.
* Either way we need to update our knowledge of the master.
* @param event WatchedEvent from ZooKeeper.
*/
public void process(WatchedEvent event) {
EventType type = event.getType();
KeeperState state = event.getState();
LOG.info("Got ZooKeeper event, state: " + state + ", type: " +
type + ", path: " + event.getPath());
// Ignore events if we're shutting down.
if (stopRequested.get()) {
LOG.debug("Ignoring ZooKeeper event while shutting down");
return;
}
if (state == KeeperState.Expired) {
LOG.error("ZooKeeper session expired");
boolean restart =
this.conf.getBoolean("hbase.regionserver.restart.on.zk.expire", false);
if (restart) {
restart();
} else {
abort();
}
} else if (type == EventType.NodeDeleted) {
watchMasterAddress();
} else if (type == EventType.NodeCreated) {
getMaster();
// ZooKeeper watches are one time only, so we need to re-register our watch.
watchMasterAddress();
}
}
  可以看出来  hbase.regionserver.restart.on.zk.expire设置为true的话,会restart否则会abort,这样可以防止RS自杀。不过我看官方文档没有给出
hbase.regionserver.restart.on.zk.expire配置。
页: [1]
查看完整版本: HBase RegionServer 退出 ( ZooKeeper session expired)