Text代码
2013-06-21 18:53:39,182 FATAL org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node x.x.x.x:50010 is attempting to report storage ID DS-1357535176-x.x.x.x-50010-1371808472808. Node y.y.y.y:50010 is expected to serve this storage.
原因分析:
拷贝hadoop安装包时,包含data与tmp文件夹(见本人《hadoop安装》一文),未成功格式化datanode
解决办法:
Shell代码
rm -rf /data/hadoop/hadoop-1.1.2/data
rm -rf /data/hadoop/hadoop-1.1.2/tmp
hadoop datanode -format
2. safe mode
Text代码
2013-06-20 10:35:43,758 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop cause:org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot renew lease for DFSClient_hb_rs_wdev1.corp.qihoo.net,60020,1371631589073. Name node is in safe mode.
解决方案:
Shell代码
hadoop dfsadmin -safemode leave
3.连接异常
Text代码
2013-06-21 19:55:05,801 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to homename/x.x.x.x:9000 failed on local exception: java.io.EOFException
可能原因:
hive错误
1.NoClassDefFoundError
Could not initialize class java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.io.HbaseObjectWritable
将protobuf-***.jar添加到jars路径
Xml代码
//$HIVE_HOME/conf/hive-site.xml
hive.aux.jars.path
file:///data/hadoop/hive-0.10.0/lib/hive-hbase-handler-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/hbase-0.94.8.jar,file:///data/hadoop/hive-0.10.0/lib/zookeeper-3.4.5.jar,file:///data/hadoop/hive-0.10.0/lib/guava-r09.jar,file:///data/hadoop/hive-0.10.0/lib/hive-contrib-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/protobuf-java-2.4.0a.jar
2.hive动态分区异常
[Fatal Error] Operator FS_2 (id=2): Number of dynamic partitions exceeded hive.exec.max.dynamic.partitions.pernode
Shell代码
hive> set hive.exec.max.dynamic.partitions.pernode = 10000;
3.mapreduce进程超内存限制——hadoop Java heap space
vim mapred-site.xml添加:
Xml代码
//mapred-site.xml
mapred.child.java.opts
-Xmx2048m
Shell代码
#$HADOOP_HOME/conf/hadoop_env.sh
export HADOOP_HEAPSIZE=5000
4.hive文件数限制
[Fatal Error] total number of created files now is 100086, which exceeds 100000
Shell代码
hive> set hive.exec.max.created.files=655350;
5.metastore连接超时
Text代码
FAILED: SemanticException org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
解决方案:
Shell代码
hive> set hive.metastore.client.socket.timeout=500;
6. java.io.IOException: error=7, Argument list too long
Text代码
Task with the most failures(5):
-----
Task ID:
task_201306241630_0189_r_000009
URL: http://namenode.godlovesdog.com: ... 41630_0189_r_000009
-----
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"164058872","reducesinkkey1":"djh,S1","reducesinkkey2":"20130117170703","reducesinkkey3":"xxx"},"value":{"_col0":"1","_col1":"xxx","_col2":"20130117170703","_col3":"164058872","_col4":"xxx,S1"},"alias":0}
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:270)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:520)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"164058872","reducesinkkey1":"xxx,S1","reducesinkkey2":"20130117170703","reducesinkkey3":"xxx"},"value":{"_col0":"1","_col1":"xxx","_col2":"20130117170703","_col3":"164058872","_col4":"djh,S1"},"alias":0}
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:258)
... 7 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20000]: Unable to initialize custom script.
at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:354)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:249)
... 7 more
Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.7": error=7, 参数列表过长
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1042)
at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:313)
... 15 more
Caused by: java.io.IOException: error=7, 参数列表过长
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:135)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1023)
... 16 more
FAILED: Execution Error, return code 20000 from org.apache.hadoop.hive.ql.exec.MapRedTask. Unable to initialize custom script.
解决方案:
升级内核或减少分区数https://issues.apache.org/jira/browse/HIVE-2372
6.runtime error
Shell代码
hive> show tables;
FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
问题排查:
Shell代码
hive -hiveconf hive.root.logger=DEBUG,console
Text代码
13/07/15 16:29:24 INFO hive.metastore: Trying to connect to metastore with URI thrift://xxx.xxx.xxx.xxx:9083
13/07/15 16:29:24 WARN hive.metastore: Failed to connect to the MetaStore Server...
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 拒绝连接
。。。
MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: 拒绝连接
尝试连接9083端口,netstat查看该端口确实没有被监听,第一反应是hiveserver没有正常启动。查看hiveserver进程却存在,只是监听10000端口。
查看hive-site.xml配置,hive客户端连接9083端口,而hiveserver默认监听10000,找到问题根源了
解决办法:
Shell代码
hive --service hiveserver -p 9083
//或修改$HIVE_HOME/conf/hive-site.xml的hive.metastore.uris部分
//将端口改为10000