HBase RegionServer 退出 ( ZooKeeper session expired)
RegionServer 由于 ZooKeeper session expired 而退出,头疼了很久,总结可能的原因:1、网络不好
2、GC时间过长,程序暂停导致租约过期
3、CPU忙,维护zookeeper的线程不能及时得到执行机会(调度)
解决方案:
[*]RS配置zookeeper.session.timeout时间长点,我配置的180000
[*]RS配置hbase.regionserver.restart.on.zk.expire设置为true
参考下源代码
/**
* We register ourselves as a watcher on the master address ZNode. This is
* called by ZooKeeper when we get an event on that ZNode. When this method
* is called it means either our master has died, or a new one has come up.
* Either way we need to update our knowledge of the master.
* @param event WatchedEvent from ZooKeeper.
*/
public void process(WatchedEvent event) {
EventType type = event.getType();
KeeperState state = event.getState();
LOG.info("Got ZooKeeper event, state: " + state + ", type: " +
type + ", path: " + event.getPath());
// Ignore events if we're shutting down.
if (stopRequested.get()) {
LOG.debug("Ignoring ZooKeeper event while shutting down");
return;
}
if (state == KeeperState.Expired) {
LOG.error("ZooKeeper session expired");
boolean restart =
this.conf.getBoolean("hbase.regionserver.restart.on.zk.expire", false);
if (restart) {
restart();
} else {
abort();
}
} else if (type == EventType.NodeDeleted) {
watchMasterAddress();
} else if (type == EventType.NodeCreated) {
getMaster();
// ZooKeeper watches are one time only, so we need to re-register our watch.
watchMasterAddress();
}
}
可以看出来 hbase.regionserver.restart.on.zk.expire设置为true的话,会restart否则会abort,这样可以防止RS自杀。不过我看官方文档没有给出
hbase.regionserver.restart.on.zk.expire配置。
页:
[1]