Hadoop学习三十六：使用BulkLoad时Bulk load operation did not find any files

arongsoft · 发表于 2016-12-13 09:40:44

一.错误
　　使用BulkLoad向Hbase导入数据时出现了错误

2014-04-04 15:39:08,521 WARN org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles - Bulk load operation did not find any files to load in directory hdfs://192.168.1.200:9000/user/root/output1. Does it contain files in subdirectories that correspond to column family names?
　　然后去看MapReduce的临时输出目录，果然没有data文件夹，只有_SUCCESS文件。
二.job.setMapOutputValueClass与job.setOutputValueClass
　　这一定是Reduce的问题了，去看看HFileOutputFormat.configureIncrementalLoad(job, htable); 到底做了什么。

job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(KeyValue.class);
job.setOutputFormatClass(HFileOutputFormat.class);
// Based on the configured map output class, set the correct reducer to properly
// sort the incoming values.
// TODO it would be nice to pick one or the other of these formats.
if (KeyValue.class.equals(job.getMapOutputValueClass())) {
job.setReducerClass(KeyValueSortReducer.class);
} else if (Put.class.equals(job.getMapOutputValueClass())) {
job.setReducerClass(PutSortReducer.class);
} else if (Text.class.equals(job.getMapOutputValueClass())) {
job.setReducerClass(TextSortReducer.class);
} else {
LOG.warn("Unknown map output value type:" + job.getMapOutputValueClass());
}
　　Debug时发现，job.getMapOutputValueClass为KeyValue。再看看job.setMapOutputValueClass和job.setOutputValueClass的区别

getOutputValueClassmapreduce.job.output.value.class
setOutputValueClassmapreduce.job.output.value.class
setMapOutputValueClassmapreduce.map.output.value.class
getMapOutputValueClass mapreduce.map.output.value.class
/**
* Set the value class for the map output data. This allows the user to
* specify the map output value class to be different than the final output
* value class.
*
* @param theClass the map output value class.
* @throws IllegalStateException if the job is submitted
*/
public void setMapOutputValueClass(Class<?> theClass
) throws IllegalStateException {
ensureState(JobState.DEFINE);
conf.setMapOutputValueClass(theClass);
}
/**
* Get the value class for the map output data. If it is not set, use the
* (final) output value class This allows the map output value class to be
* different than the final output value class.
*
* @return the map output value class.
*/
public Class<?> getMapOutputValueClass() {
Class<?> retv = getClass(JobContext.MAP_OUTPUT_VALUE_CLASS, null,
Object.class);
if (retv == null) {
retv = getOutputValueClass();
}
return retv;
}
　　也就是

getMapOutputValueClass的值，在没有setMapOutputValueClass时，将使用setOutputValueClass的值。
允许map output value的class(即getMapOutputValueClass)和最终output value的(Reduceo output value的)class(即getOutputValueClass)不同。泛型类PutSortReducer<ImmutableBytesWritable, Put, ImmutableBytesWritable, KeyValue>说明map output value的class为Put，最终的为KeyValue。
上述同样适用于KeyClass。

　　我在程序里job.setOutputValueClass(Put.class)，改为job.setMapOutputValueClass(Put.class)即可。
三.HBase删除所有数据
　　这个问题跟主题没有任何关系，就当做绿叶吧。
　　昨天突然有一想法，如果不重装Hbase，有没有办法“格式化”HBase。
　　首先想到的是删掉了Hdfs上hbase目录，再重启HBase，发现RegionServer连接不上Master。应该是-ROOT-表和.META.表已经被删掉了，RegionServer向zookeeper汇报心跳时，zookeeper去-ROOT-表里查找此RegionServer的相关信息，发现信息已经丢失，也就无法将此RegionServer信息通知给Master。删掉zookeeper信息，再次重启成功。

rm -rf /tmp/hbase-root*

<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/tmp/hbase-root</value> default
<description>Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] Hadoop学习三十六：使用BulkLoad时Bulk load operation did not find any files

浏览过的版块

扫码加入运维网微信交流群