Hadoop问题小记

111 · 发表于 2015-7-12 08:27:44

　　1.使用eclipse开发mapreduce程序，发现是跑在本地（LocalRunnerJob），而不是集群。
　　解决方法：将程序打成jar包，然后使用hadoop命令行运行。打包用Fat jar这个工具将第三方jar包一起发布，不要勾选One-JAR.
　　错误：Exception in thread "main" java.lang.IllegalArgumentException: Unable to locate com.simontuffs.onejar.Boot in the java.class.path: consider using -Done-jar.jar.  path to specify the one-jar filename.
　　2.FAILED Too many fetch-failures
　　解决方法：
　　1) 检查 /etc/hosts
要求本机ip 对应服务器名
要求要包含所有的服务器ip + 服务器名
　　/etc/hosts文件最前端如下信息：
　　127.0.0.1 localhost your_hostname
　　::1          localhost6  your_hostname
　　若将这两条信息注销掉，(或者把your_hostname删除掉）上述错误即可解决。
2) 检查 .ssh/authorized_keys
要求包含所有服务器（包括其自身）的public key
　　
尽
管我们在安装hadoop之前已经配置了各节点的SSH无密码通信，假如有3个IP分别为
192.168.128.131 192.168.128.132 192.168.133 ，对应的主机名为
master 、 slave1 、 slave2 。从每个节点第一次执行命令$ ssh 主机名（master 、slave1 、
slave2) 的时候，会出现一行关于密钥的yes or no ？的提示信息，Enter确认后再次连接就正常了。如果我们没有手动做这一步，如果恰
好在hadoop/conf/core-site.xml 及 mpred-site.xml中相应的IP 用主机名代替了，则很可能出现该异常。
　　3.hadoop上任务reduce个数为1问题解决

　　Hadoop的参数会受客户端设置参数影响，我的任务在hadoop上运行时reduce个数总是1，查看hadoop安装路径下的conf文件夹中的配置文件，查看/conf/hadoop-site.xml  或者 /conf/hadoop-default.xml，查找：
　　
  mapred.reduce.tasks
  1
  The default number of reduce tasks per job.  Typically set
  to a prime close to the number of available hosts.  Ignored when
  mapred.job.tracker is "local".


　　需要对这个参数进行修改，修改为：
　　
  mapred.reduce.tasks
  11
  The default number of reduce tasks per job.  Typically set
  to a prime close to the number of available hosts.  Ignored when
  mapred.job.tracker is "local".


　　之后运行检查reduce个数，此时reduce个数为：11。修改成功。
　　参考：http://blog.chinaunix.net/uid-1838361-id-287231.html
　　4.org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException
　　5.HBASE SHELL 错误NativeException: org.apache.hadoop.hbase.MasterNotRunningException: null
　　参考：http://blog.sina.com.cn/s/blog_718335510100zchp.html             http://www.iyunv.com/tangtianfly/archive/2012/04/11/2441760.html
　　6.ZooKeeper session expired
　　参考：http://jiajun.iteye.com/blog/1013215 http://www.kuqin.com/system-analysis/20110910/264590.html

　　7.org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase/.logs/Slave2,60020,1366353790042/Slave2%2C60020%2C1366353790042.1366353792650 File does not exist. [Lease.  Holder: DFSClient_hb_rs_Slave2,60020,1366353790042, pendingcreates: 1]
修改 hadoop的配置文件 conf/hdfs-site.xml，添加

      dfs.datanode.max.xcievers
      4096

　　待确认!!!

8.Failed setting up proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface

参考：http://stackoverflow.com/questions/6007725/hbase-error-assignment-of-root-failure

9.org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode

10.解决客户端通过zookeeper连接到hbase时连接过多的问题

　　原因：客户端程序通过zookeeper访问hbase的连接数超过设置的默认链接数(默认数是30)，连接数不够用会导致后续的连接连接不上去。
　　解决办法：设置hbase-site.xml配置文件，添加如下属性
　　
hbase.zookeeper.property.maxClientCnxns
300
Property from ZooKeeper's config zoo.cfg.
Limit on number of concurrent connections (at the socket level) that a
single client, identified by IP address, may make to a single member of
the ZooKeeper ensemble. Set high to avoid zk connection issues running
standalone and pseudo-distributed.



　　
将最大连接数我这设置成了300，后来发现仍然提示同样的问题，最大连接数并没有起作用，根据属性提示，直接修改zoo.cfg配置文件
　　
添加：maxClientCnxns=300

　　
重启下zookeeper，hbase,重新测试，问题解决。
　　11.job failed：# of failed Reduce Tasks exceeded allowed limit. FailedCount:
　　参考：http://blog.163.com/zhengjiu_520/blog/static/3559830620130743644473/
　　12.FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. Requesting close of hlog
java.io.IOException
　　参考：http://blog.sina.com.cn/s/blog_53765cf90101auqo.html 待确认 http://www.codesky.net/article/201206/171897.html

　　13.Hbase Lease Exception
　　设置hbase.regionserver.lease.period和hbase.rpc.timeout hbase.rpc.timeout >=hbase.regionserver.lease.period
　　14.Task attempt_failed to report status for 600 seconds. Killing!
　　参考：http://stackoverflow.com/questions/5864589/how-to-fix-task-attempt-201104251139-0295-r-000006-0-failed-to-report-status-fo

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] Hadoop问题小记

扫码加入运维网微信交流群