HRegionServer: ZooKeeper session expired

zhouu 发表于 2017-4-19 07:14:24

　　Hbase不稳定，分析日志发现，归纳总结，目前发现共存在两个问题，一个就是上篇博客提到的问题，还有个问题就是zookeeper的问题
　　我的异常输出为：
　　2010-10-28 00:36:49,573 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x9d2be33dbe860005 to sun.nio.ch.SelectionKeyImpl@6655bb93

java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer

        at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701)

        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)

2010-10-28 00:36:49,602 WARN org.apache.zookeeper.ClientCnxn: Ignoring exception during shutdown input

java.net.SocketException: Transport endpoint is not connected

        at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)

        at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:658)

        at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:378)

        at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)

        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)

2010-10-28 00:36:49,602 WARN org.apache.zookeeper.ClientCnxn: Ignoring exception during shutdown output

java.net.SocketException: Transport endpoint is not connected

        at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)

        at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:669)

        at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:386)

        at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004)

        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)

2010-10-28 00:36:49,622 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1

org.apache.hadoop.hbase.Leases$LeaseStillHeldException

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)

        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

        at java.lang.reflect.Constructor.newInstance(Constructor.java:532)

        at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)

        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)

        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)

        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:549)

        at java.lang.Thread.run(Thread.java:636)

2010-10-28 00:36:49,703 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: Disconnected, type: None, path: null

2010-10-28 00:36:50,505 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server /192.168.5.151:2181

2010-10-28 00:36:50,505 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel

2010-10-28 00:36:50,506 INFO org.apache.zookeeper.ClientCnxn: Server connection successful

2010-10-28 00:36:50,507 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x9d2be33dbe860005 to sun.nio.ch.SelectionKeyImpl@335819e4

java.io.IOException: Session Expired

        at org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)

        at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)

        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)

2010-10-28 00:36:50,507 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: Expired, type: None, path: null

2010-10-28 00:36:50,507 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expired
　　然后，区域节点就会exit!
　　对了，介绍个好地方哦：http://wiki.apache.org/hadoop/Hbase/Troubleshooting
　　这里会有些你碰到的问题，对了还个汇总的地方，也介绍下：http://bbs.hadoopor.com/thread-71-1-1.html
　　说说解决方法：
　　设置zookeeper的过期时间长一点，默认的过期时间(zookeeper.session.timeout
)是60秒，参看这里：
http://hbase.apache.org/docs/r0.20.6/hbase-conf.html
　　他和另外个因素（hbase.zookeeper.property.tickTime
）配合使用。
　　我设置如下：

<property>
<name>zookeeper.session.timeout</name>
<value>90000</value>
</property>
<property>
<name>hbase.zookeeper.property.tickTime</name>
<value>9000</value>
</property>
　　在段时间内，没有发现再出问题，不知道是否根解了。
　　另外注意细读理解这里的列出来的几点：

Resolution

[*]Make sure you give plenty of RAM (in hbase-env.sh), the default of 1GB won't be able to sustain long running imports.
[*]Make sure you don't swap, the JVM never behaves well under swapping.
[*]Make
sure you are not CPU starving the region server thread. For example, if
you are running a mapreduce job using 6 CPU-intensive tasks on a
machine with 4 cores, you are probably starving the region server enough
to create longer garbage collection pauses.
[*]If
you wish to increase the session timeout, add the following to your
hbase-site.xml to increase the timeout from the default of 60 seconds to
120 seconds.

<property>
<name>zookeeper.session.timeout</name>
<value>1200000</value>
</property>
<property>
<name>hbase.zookeeper.property.tickTime</name>
<value>6000</value>
</property>

[*]Be
aware that setting a higher timeout means that the regions served by a
failed region server will take at least that amount of time to be
transfered to another region server. For a production system serving
live requests, we would instead recommend setting it lower than 1 minute
and over-provision your cluster in order the lower the memory load on
each machines (hence having less garbage to collect per machine).
[*]
If this is happening during an upload which only happens once (like initially loading all your data into HBase), consider importing into HFiles directly
.

[*]
HBase ships with some GC tuning, for more information see Performance Tuning
.

页: [1]

运维网's Archiver

HRegionServer: ZooKeeper session expired