设为首页 收藏本站
查看: 1580|回复: 0

[经验分享] hadoop hive 常见问题解决持续更新

[复制链接]

尚未签到

发表于 2018-10-28 12:29:33 | 显示全部楼层 |阅读模式
  安装过程中,由于网络终端,导致下面问题:
  问题1:安装停止在获取安装锁
  /tmp/scm_prepare_node.tYlmPfrT
  usingSSH_CLIENT to get the SCM hostname: 172.16.77.20 33950 22
  opening logging file descriptor
  正在启动安装脚本...正在获取安装锁...BEGIN flock 4
  这段大概过了半个小时,一次卸载,一次等了快1个小时,终于过去了,
  问题2:不能选择主机
  安装失败了,重新不能选主机
  图1
  解决方案,需要清理安装失败文件
  卸载 Cloudera Manager 5.1.x.和 相关软件【官网翻译:高可用】
  问题3:DNS反向解析PTR localhost:
  描述:
  DNS反向解析错误,不能正确解析Cloudera Manager Server主机名
  日志:
  Detecting Cloudera Manager Server...
  Detecting Cloudera Manager Server...
  BEGIN host -t PTR 192.168.1.198
  198.1.168.192.in-addr.arpa domain name pointerlocalhost.
  END (0)
  using localhost as scm server hostname
  BEGIN which python
  /usr/bin/python
  END (0)
  BEGIN python -c 'import socket; import sys; s = socket.socket(socket.AF_INET);s.settimeout(5.0); s.connect((sys.argv[1], int(sys.argv[2]))); s.close();'localhost 7182
  Traceback (most recent call last):
  File "", line 1, in
  File "", line 1, in connect
  socket.error: [Errno 111] Connection refused
  END (1)
  could not contact scm server at localhost:7182, giving up
  waiting for rollback request
  解决方案:
  将连不上的机器 /usr/bin/host 文件删掉,执行下面命令:


  • sudo mv/usr/bin/host /usr/bin/host.bak
  复制代码
  说明:
  不明白cloudera的初衷,这里已经得到 ClouderaManager Server的ip了,却还要把ip解析成主机名来连接
  由于DNS反向解析没有配置好,根据Cloudera ManagerServer 的ip解析主机名却得到了localhost,造成之后的连接错误
  这里的解决方案是直接把/usr/bin/host删掉,这样ClouderaManager就会直接使用 ip进行连接,就没有错了
  参考:
  问题 4 NTP:
  问题描述:
  Bad Health --Clock Offset
  The host's NTP service did not respond to a request forthe clock offset.
  解决:
  配置NTP服务
  步骤参考:
  CentOS配置NTP Server:
  http://www.hailiangchen.com/centos-ntp/
  国内常用NTP服务器地址及IP
  http://www.douban.com/note/171309770/
  修改配置文件:
  [root@work03 ~]# vim /etc/ntp.conf

Use public servers from the pool.ntp.org project.

Please consider joining the pool (http://www.pool.ntp.org/join.html).
  server s1a.time.edu.cn prefer
  server s1b.time.edu.cn
  server s1c.time.edu.cn
  restrict 172.16.1.0 mask 255.255.255.0 nomodify    /root/ntpdate.log2>&1
  问题 2.2
  描述:
  Clock Offset
  ·        Ensure that thehost's hostname is configured properly.
  ·        Ensure that port7182 is accessible on the Cloudera Manager Server (check firewall rules).
  ·        Ensure that ports9000 and 9001 are free on the host being added.
  ·        Check agent logsin /var/log/cloudera-scm-agent/ on the host being added (some of the logs canbe found in the installation details).
  问题定位:
  在对应host(work02、work03)上运行 'ntpdc -c loopinfo'
  [root@work03 work]# ntpdc -c loopinfo
  ntpdc: read: Connection refused
  解决:
  开启ntp服务:
  三台机器都开机启动 ntp服务
  chkconfig ntpd on
  问题 5 heartbeat:
  错误信息:
  Installation failed. Failed to receive heartbeat from agent.
  解决:关闭防火墙
  问题 6 Unknow Health:
  Unknow Health
  重启后:Request to theHost Monitor failed.
  service --status-all| grep clo
  机器上查看scm-agent状态:cloudera-scm-agentdead but pid file exists
  解决:重启服务
  service cloudera-scm-agent restart
  service cloudera-scm-server restart
  问题 7 canonial name hostnameconsistent:
  Bad Health
  The hostname and canonical name for this host are notconsistent when checked from a Java process.
  canonical name:
  4092 Monitor-HostMonitor throttling_loggerWARNING  (29 skipped) hostname work02 differs from the canonical namework02.xinzhitang.com
  解决:修改hosts 使FQDN和 hostname相同
  ps:虽然解决了但是不明白为什么主机名和主机别名要一样
  /etc/hosts
  192.168.1.185 work01 work01
  192.168.1.141 work02 work02
  192.168.1.198 work03 work03
  问题 8 Concerning Health:
  Concerning Health Issue
  --  Network Interface Speed --
  描述:The host has 2 network interface(s) that appear to beoperating at less than full speed. Warning threshold: any.
  详细:
  This is a host health test that checks for networkinterfaces that appear to be operating at less than full speed.
  A failure of this health test may indicate that network interface(s) may beconfigured incorrectly and may be causing performance problems. Use the ethtoolcommand to check and configure the host's network interfaces to use the fastestavailable link speed and duplex mode.
  解决:
  本次测试修改了 Cloudera Manager 的配置,应该不算是真正的解决
  问题10 IOException thrown while collecting data from host: No route to host
  原因:agent开启了防火墙
  解决:service iptables stop
  问题11
  2、Clouderarecommendssetting /proc/sys/vm/swappiness to 0. Current setting is 60. Use thesysctlcommand to change this setting at runtime and edit /etc/sysctl.conf forthissetting to be saved after a reboot. You may continue with installation, butyoumay run into issues with Cloudera Manager reporting that your hostsareunhealthy because they are swapping. The following hosts are affected:
  解决:

echo 0>/proc/sys/vm/swappiness (toapply for now)

sysctl-wvm.swappiness=0  (to makethis persistentacross reboots)
  问题12 时钟不同步(同步至中科大时钟服务器202.141.176.110)

echo "0 3 * **/usr/sbin/ntpdate 202.141.176.110;/sbin/hwclock–w">>/var/spool/cron/root

service crondrestart

ntpdate202.141.176.110
  问题13 The host's NTPservice didnot respond to a request for the clock offset.
  #service ntpdstart

ntpdc -cloopinfo (thehealth will be good if this command executed successfully)
  问题14 The Cloudera ManagerAgentis not able to communicate with this role's web server.
  一种原因是元数据数据库无法连接,请检查数据库配置:
  问题15 Hive MetastoreServer无法启动,修改Hive元数据数据库配置(当我们修改主机名后即应修改元数据数据库配置):
  问题排查方式
  一般的错误,查看错误输出,按照关键字google
  异常错误(如namenode、datanode莫名其妙挂了):查看hadoop($HADOOP_HOME/logs)或hive日志
  hadoop错误
  问题16 datanode无法正常启动
  添加datanode后,datanode无法正常启动,进程一会莫名其妙挂掉,查看namenode日志显示如下:
  

Text代码  


  2013-06-21 18:53:39,182 FATALorg.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data nodex.x.x.x:50010 is attempting to report storage>  原因分析:
  拷贝hadoop安装包时,包含data与tmp文件夹(见本人《hadoop安装》一文),未成功格式化datanode
  解决办法:
  

Shell代码  

  rm -rf /data/hadoop/hadoop-1.1.2/data
  rm -rf /data/hadoop/hadoop-1.1.2/tmp
  hadoop datanode -format
  问题17  safe mode
  Text代码
  2013-06-2010:35:43,758 ERROR org.apache.hadoop.security.UserGroupInformation:PriviledgedActionException as:hadoopcause:org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot renewlease for DFSClient_hb_rs_wdev1.corp.qihoo.net,60020,1371631589073. Name nodeis in safe mode.
  解决方案:
  

     Shell代码  

  hadoopdfsadmin -safemode leave
  问题18 连接异常
  Text代码
  2013-06-21 19:55:05,801 WARNorg.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call tohomename/x.x.x.x:9000 failed on local exception: java.io.EOFException
  可能原因:
  namenode监听127.0.0.1:9000,而非0.0.0.0:9000或外网IP:9000
  iptables限制
  解决方案:
  检查/etc/hosts配置,使得hostname绑定到非127.0.0.1的IP上
  iptables放开端口

  问题19 namenode>  Text代码
  ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:java.io.IOException: Incompatible namespaceIDs in/var/lib/hadoop-0.20/cache/hdfs/dfs/data: namenode namespaceID = 240012870;datanode namespaceID = 1462711424 .
  

问题:Namenode上namespaceID与datanode上namespaceID不一致。   

  问题产生原因:每次namenode format会重新创建一个namenodeId,而tmp/dfs/data下包含了上次format下的id,namenode format清空了namenode下的数据,但是没有清空datanode下的数据,所以造成namenode节点上的namespaceID与 datanode节点上的namespaceID不一致。启动失败。
  解决办法:参考该网址http://blog.csdn.net/wh62592855/archive/2010/07/21/5752199.aspx 给出两种解决方法,我们使用的是第一种解决方法:即:
  (1)停掉集群服务
  (2)在出问题的datanode节点上删除data目录,data目录即是在hdfs-site.xml文件中配置的 dfs.data.dir目录,本机器上那个是/var/lib/hadoop-0.20/cache/hdfs/dfs/data/(注:我们当时在所有的datanode和namenode节点上均执行了该步骤。以防删掉后不成功,可以先把data目录保存一个副本).
  (3)格式化namenode.
  (4)重新启动集群。
  问题解决。
  这种方法带来的一个副作用即是,hdfs上的所有数据丢失。如果hdfs上存放有重要数据的时候,不建议采用该方法,可以尝试提供的网址中的第二种方法。
  问题20 目录权限
  

start-dfs.sh执行无错,显示启动datanode,执行完后无datanode。查看datanode机器上的日志,显示因dfs.data.dir目录权限不正确导致:  

  
Text代码
  

  expected: drwxr-xr-x,current:drwxrwxr-x
  解决办法:
  查看dfs.data.dir的目录配置,修改权限即可。
  hive错误
  问题21 NoClassDefFoundError

  Could not initialize>  将protobuf-***.jar添加到jars路径
  

      Xml代码  

  //$HIVE_HOME/conf/hive-site.xml
  hive.aux.jars.path
  file:///data/hadoop/hive-0.10.0/lib/hive-hbase-handler-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/hbase-0.94.8.jar,file:///data/hadoop/hive-0.10.0/lib/zookeeper-3.4.5.jar,file:///data/hadoop/hive-0.10.0/lib/guava-r09.jar,file:///data/hadoop/hive-0.10.0/lib/hive-contrib-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/protobuf-java-2.4.0a.jar
  问题22 hive动态分区异常
  [Fatal Error] Operator FS_2 (id=2): Number of dynamic partitions exceededhive.exec.max.dynamic.partitions.pernode
  

Shell代码  

  
hive> sethive.exec.max.dynamic.partitions.pernode = 10000;
  

  问题23 mapreduce进程超内存限制——hadoop Java heap space
  vim mapred-site.xml添加:
  

     Xml代码  

  //mapred-site.xml
  

     mapred.child.java.opts  

  -Xmx2048m
  

  Shell代码
  

  #$HADOOP_HOME/conf/hadoop_env.sh
  exportHADOOP_HEAPSIZE=5000
  问题24 hive文件数限制
  [Fatal Error] total number of created files now is 100086, which exceeds 100000
  

Shell代码  

  
hive> sethive.exec.max.created.files=655350;
  

  问题25 hive 5.metastore连接超时
  Text代码
  FAILED:SemanticException org.apache.thrift.transport.TTransportException:java.net.SocketTimeoutException: Read timed out
  解决方案:
  

     Shell代码  

  hive>set hive.metastore.client.socket.timeout=500;
  问题26 hive 6.java.io.IOException: error=7, Argument list too long
  Text代码
  Task withthe most failures(5):

  Task>  task_201306241630_0189_r_000009
  URL:
  http://namenode.godlovesdog.com:50030/taskdetails.jsp?jobid=job_201306241630_0189&tipid=task_201306241630_0189_r_000009
  DiagnosticMessages for this Task:
  java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error whileprocessing row (tag=0){"key":{"reducesinkkey0":"164058872","reducesinkkey1":"djh,S1","reducesinkkey2":"20130117170703","reducesinkkey3":"xxx"},"value":{"_col0":"1","_col1":"xxx","_col2":"20130117170703","_col3":"164058872","_col4":"xxx,S1"},"alias":0}
  

     atorg.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:270)  

  at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:520)
  

  atorg.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
  

  atorg.apache.hadoop.mapred.Child$4.run(Child.java:255)
  

  atjava.security.AccessController.doPrivileged(Native Method)
  

  at javax.security.auth.Subject.doAs(Subject.java:415)
  

  atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
  

  atorg.apache.hadoop.mapred.Child.main(Child.java:249)
  

  Caused by:org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error whileprocessing row (tag=0){"key":{"reducesinkkey0":"164058872","reducesinkkey1":"xxx,S1","reducesinkkey2":"20130117170703","reducesinkkey3":"xxx"},"value":{"_col0":"1","_col1":"xxx","_col2":"20130117170703","_col3":"164058872","_col4":"djh,S1"},"alias":0}
  

     atorg.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:258)  

  ... 7 more
  

  Caused by:org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20000]: Unable toinitialize custom script.
  

     atorg.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:354)  

  atorg.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
  

  atorg.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
  

  atorg.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
  

  atorg.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
  

  atorg.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
  

  atorg.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
  

  at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
  

  atorg.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:249)
  

  ... 7 more
  

  Caused by:java.io.IOException: Cannot run program "/usr/bin/python2.7":error=7, 参数列表过长
  

     at java.lang.ProcessBuilder.start(ProcessBuilder.java:1042)  

  atorg.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:313)
  

  ... 15 more
  

  Caused by:java.io.IOException: error=7, 参数列表过长
  

     atjava.lang.UNIXProcess.forkAndExec(Native Method)  

  at java.lang.UNIXProcess.(UNIXProcess.java:135)
  

  atjava.lang.ProcessImpl.start(ProcessImpl.java:130)
  

  atjava.lang.ProcessBuilder.start(ProcessBuilder.java:1023)
  

  ... 16 more
  

  FAILED:Execution Error, return code 20000 fromorg.apache.hadoop.hive.ql.exec.MapRedTask. Unable to initialize custom script.
  解决方案:
  升级内核或减少分区数https://issues.apache.org/jira/browse/HIVE-2372
  问题27 hive 6.runtime error
  

Shell代码  

  hive> show tables;
  FAILED: Error in metadata: java.lang.RuntimeException:Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
  FAILED: Execution Error, return code 1 fromorg.apache.hadoop.hive.ql.exec.DDLTask
  问题排查:
  

     Shell代码  

  hive -hiveconf hive.root.logger=DEBUG,console
  

     Text代码  

  13/07/15 16:29:24 INFO hive.metastore: Trying to connectto metastore with URI thrift://xxx.xxx.xxx.xxx:9083
  13/07/15 16:29:24 WARN hive.metastore: Failed to connectto the MetaStore Server...
  org.apache.thrift.transport.TTransportException:java.net.ConnectException: 拒绝连接
  。。。
  MetaException(message:Could not connect to meta storeusing any of the URIs provided. Most recent failure:org.apache.thrift.transport.TTransportException: java.net.ConnectException: 拒绝连接
  

尝试连接9083端口,netstat查看该端口确实没有被监听,第一反应是hiveserver没有正常启动。查看hiveserver进程却存在,只是监听10000端口。   

  查看hive-site.xml配置,hive客户端连接9083端口,而hiveserver默认监听10000,找到问题根源了
  解决办法:
  

Shell代码  

  hive --service hiveserver -p 9083
  //或修改$HIVE_HOME/conf/hive-site.xml的hive.metastore.uris部分
  //将端口改为10000
  using /usr/lib/hive as HIVE_HOME
  using /var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTOREas HIVE_CONF_DIR
  using /usr/lib/hadoop as HADOOP_HOME
  using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE/yarn-conf asHADOOP_CONF_DIR
  ERROR: Failed to find hive-hbase storage handler jars toadd in hive-site.xml. Hive queries that use Hbase storage handler may not workuntil this is fixed.
  Wed Oct 22 18:48:53 CST 2014
  JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
  using /usr/java/jdk1.7.0_45-cloudera as JAVA_HOME
  using 5 as CDH_VERSION
  using /usr/lib/hive as HIVE_HOME
  using /var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTOREas HIVE_CONF_DIR
  using /usr/lib/hadoop as HADOOP_HOME
  using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE/yarn-conf asHADOOP_CONF_DIR
  ERROR: Failed to find hive-hbase storage handler jars toadd in hive-site.xml. Hive queries that use Hbase storage handler may not workuntil this is fixed.
  Wed Oct 22 18:48:55 CST 2014
  JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
  using /usr/java/jdk1.7.0_45-cloudera as JAVA_HOME
  using 5 as CDH_VERSION
  using /usr/lib/hive as HIVE_HOME
  using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE as HIVE_CONF_DIR
  using /usr/lib/hadoop as HADOOP_HOME
  using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE/yarn-conf asHADOOP_CONF_DIR
  ERROR: Failed to find hive-hbase storage handler jars toadd in hive-site.xml. Hive queries that use Hbase storage handler may not workuntil this is fixed.
  Wed Oct 22 18:48:58 CST 2014
  JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
  using /usr/java/jdk1.7.0_45-cloudera as JAVA_HOME
  using 5 as CDH_VERSION
  using /usr/lib/hive as HIVE_HOME
  using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE as HIVE_CONF_DIR
  using /usr/lib/hadoop as HADOOP_HOME
  using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE/yarn-conf asHADOOP_CONF_DIR
  ERROR: Failed to find hive-hbase storage handler jars toadd in hive-site.xml. Hive queries that use Hbase storage handler may not workuntil this is fixed.
  JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
  using /usr/java/jdk1.7.0_45-cloudera as JAVA_HOME
  using 5 as CDH_VERSION
  using /usr/lib/hive as HIVE_HOME
  using /var/run/cloudera-scm-agent/process/212-hive-metastore-create-tables as HIVE_CONF_DIR
  using /usr/lib/hadoop as HADOOP_HOME
  using /var/run/cloudera-scm-agent/process/212-hive-metastore-create-tables/yarn-conf as HADOOP_CONF_DIR
  ERROR: Failed to find hive-hbase storage handler jars to add in hive-site.xml. Hive queries that use Hbase storage handler may not work until this is fixed.
  查看  /usr/lib/hive 是否正常
  正常的
  下午3点21:09.801        FATAL        org.apache.hadoop.hbase.master.HMaster
  Unhandled exception. Starting shutdown.
  java.io.IOException: error or interruptedwhile splitting logs in[hdfs://master:8020/hbase/WALs/slave2,60020,1414202360923-splitting] Task =installed = 2 done = 1 error = 1
  

     atorg.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:362)  

  atorg.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
  

  atorg.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
  

  atorg.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
  

  atorg.apache.hadoop.hbase.master.HMaster.splitMetaLogBeforeAssignment(HMaster.java:1070)
  

  atorg.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:854)
  

  atorg.apache.hadoop.hbase.master.HMaster.run(HMaster.java:606)
  

  atjava.lang.Thread.run(Thread.java:744)
  

  下午3点46:12.903        FATAL        org.apache.hadoop.hbase.master.HMaster
  Unhandled exception. Starting shutdown.
  java.io.IOException: error or interruptedwhile splitting logs in[hdfs://master:8020/hbase/WALs/slave2,60020,1414202360923-splitting] Task =installed = 1 done = 0 error = 1
  

     atorg.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:362)  

  atorg.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
  

  atorg.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
  

  atorg.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
  

  atorg.apache.hadoop.hbase.master.HMaster.splitMetaLogBeforeAssignment(HMaster.java:1070)
  

  atorg.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:854)
  

  atorg.apache.hadoop.hbase.master.HMaster.run(HMaster.java:606)
  

  atjava.lang.Thread.run(Thread.java:744)
  

  解决方法:
  在hbase-site.xml加入一条,让启动hbase集群时不做hlog splitting
  
  hbase.master.distributed.log.splitting
  false
  
  [root@master ~]# hadoop fs -mv/hbase/WALs/slave2,60020,1414202360923-splitting/  /test
  [root@master ~]# hadoop fs -ls /test
  2014-10-28 14:31:32,879  INFO[hconnection-0xd18e8a7-shared--pool2-t224] (AsyncProcess.java:673) - #3,table=session_service_201410210000_201410312359, attempt=14/35 failed 1383 ops,last exception: org.apache.hadoop.hbase.RegionTooBusyException:org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit,regionName=session_service_201410210000_201410312359,7499999991,1414203068872.08ee7bb71161cb24e18ddba4c14da0f2.,server=slave1,60020,1414380404290, memstoreSize=271430320,blockingMemStoreSize=268435456
  

   atorg.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:2561)  

  atorg.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:1963)
  

  at org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4050)
  

  atorg.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3361)
  

  atorg.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3265)
  

  atorg.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26935)
  

  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
  

  at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
  

  Exception
  Description
  ClockOutOfSyncException
  当一个RegionServer始终偏移太大时,master节点结将会抛出此异常.
  DoNotRetryIOException
  用于提示不要再重试的异常子类: 如UnknownScannerException.
  DroppedSnapshotException
  如果在flush过程中快照内容并没有正确的存储到文件中时,该异常将被抛出.
  HBaseIOException
  所有hbase特定的IOExceptions都是HBaseIOException类的子类.
  InvalidFamilyOperationException
  Hbase接收修改表schema的请求,但请求中对应的列族名无效.
  MasterNotRunningException
  master节点没有运行的异常
  NamespaceExistException
  已存在某namespace的异常
  NamespaceNotFoundException
  找不到该namsespace的异常
  NotAllMetaRegionsOnlineException
  某操作需要所有root及meta节点同时在线,但实际情况不满足该操作要求
  NotServingRegionException
  向某RegionServer发送访问请求,但是它并没有反应或该region不可用.
  PleaseHoldException
  当某个ResionServer宕掉并由于重启过快而导致master来不及处理宕掉之前的server实例, 或者用户调用admin级操作时master正处于初始化状态时, 或者在正在启动的RegionServer上进行操作时都会抛出此类异常.
  RegionException
  访问region时出现的异常.
  RegionTooBusyException
  RegionServer处于繁忙状态并由于阻塞而等待提供服务的异常.
  TableExistsException
  已存在某表的异常
  TableInfoMissingException
  在table目录下无法找到.tableinfo文件的异常
  TableNotDisabledException
  某个表没有正常处于禁用状态的异常
  TableNotEnabledException
  某个表没有正常处于启用状态的异常
  TableNotFoundException
  无法找到某个表的异常
  UnknownRegionException
  访问无法识别的region引起的异常.
  UnknownScannerException

  向RegionServer传递了无法识别的scanner>  YouAreDeadException
  当一个RegionServer报告它已被处理为dead状态,由master抛出此异常.
  ZooKeeperConnectionException
  客户端无法连接到zookeeper的异常.
  INFO
  org.apache.hadoop.hbase.regionserver.MemStoreFlusher
  Waited 90779ms on a compaction to clean up 'too many store files'; waited long enough... proceeding with flush of session_service_201410210000_201410312359,7656249951,1414481868315.bbf0a49fb8a9b650a584769ddd1fdd89.
  MemStoreFlusher实例生成时会启动MemStoreFlusher.FlushHandler线程实例,
  此线程个数通过hbase.hstore.flusher.count配置,默认为1
  一台机器硬盘满,一台机器硬盘不满的情况:
  群集中有 26,632 个副本不足的块块。群集中共有 84,822 个块。百分比 副本不足的块: 31.40%。 警告阈值:10.00%。
  群集中有 27,278 个副本不足的块块。群集中共有 85,476 个块。百分比 副本不足的块: 31.91%。 警告阈值:10.00%。
  下午4点08:53.847
  INFO
  org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher
  Flushed, sequenceid=45525, memsize=124.2 M, hasBloomFilter=true, into tmp file hdfs://master:8020/hbase/data/default/session_service_201410260000_201410312359/a3b64675b0069b8323665274e2f95cdc/.tmp/b7fa4f5f85354ecc96aa48a09081f786
  下午4点08:53.862
  INFO
  org.apache.hadoop.hbase.regionserver.HStore
  Added hdfs://master:8020/hbase/data/default/session_service_201410260000_201410312359/a3b64675b0069b8323665274e2f95cdc/f/b7fa4f5f85354ecc96aa48a09081f786, entries=194552, sequenceid=45525, filesize=47.4 M
  下午4点09:00.378
  WARN
  org.apache.hadoop.ipc.RpcServer
  (responseTooSlow): {"processingtimems":39279,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"192.168.5.9:41284","starttimems":1414656501099,"queuetimems":0,"class":"HRegionServer","responsesize":16,"method":"Scan"}
  下午4点09:00.379
  WARN
  org.apache.hadoop.ipc.RpcServer

  RpcServer.respondercallId: 33398 service: ClientService methodName: Scan>  下午4点09:00.380
  WARN
  org.apache.hadoop.ipc.RpcServer
  RpcServer.handler=79,port=60020: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
  下午4点09:00.381
  INFO
  org.apache.hadoop.hbase.regionserver.HRegion
  Finished memstore flush of ~128.1 M/134326016, currentsize=2.4 M/2559256 for region session_service_201410260000_201410312359,6406249959,1414571385831.a3b64675b0069b8323665274e2f95cdc. in 8133ms, sequenceid=45525, compaction requested=false
  下



运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-627498-1-1.html 上篇帖子: 20180528早课记录19-Hadoop 下篇帖子: 怎么在Hadoop集群中新增ElasticSearch操作和Hadoop插件使用
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表