设为首页 收藏本站
查看: 539|回复: 0

[经验分享] Hadoop学习三十一:Win7下HBase与MapReduce集成时XXX.jar is not a valid DFS filename

[复制链接]

尚未签到

发表于 2016-12-12 08:03:32 | 显示全部楼层 |阅读模式
一. 代码


  •      Hbase In Action(HBase实战)和Hbase:The Definitive Guide(HBase权威指南)两本书中,有很多入门级的代码,可以选择自己感兴趣的check out。地址分别为https://github.com/HBaseinaction https://github.com/larsgeorge/hbase-book。
  • 在Win7下运行Hbase与MapReduce集成章节的代码时,出现了错误。比喻这个代码https://github.com/larsgeorge/hbase-book/blob/master/ch07/src/main/java/mapreduce/ParseJson.java
二. 错误

Exception in thread "main" java.lang.IllegalArgumentException: Pathname /D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-client-0.96.1.1-hadoop2.jar from hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-client-0.96.1.1-hadoop2.jar is not a valid DFS filename.
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:184)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:92)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at com.jyz.study.hadoop.hbase.mapreduce.AnalyzeData.main(AnalyzeData.java:249)
三. 跟踪代码
  org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil

  public static void addHBaseDependencyJars(Configuration conf) throws IOException {
addDependencyJars(conf,
// explicitly pull a class from each module
org.apache.hadoop.hbase.HConstants.class,                      // hbase-common
org.apache.hadoop.hbase.protobuf.generated.ClientProtos.class, // hbase-protocol
org.apache.hadoop.hbase.client.Put.class,                      // hbase-client
org.apache.hadoop.hbase.CompatibilityFactory.class,            // hbase-hadoop-compat
org.apache.hadoop.hbase.mapreduce.TableMapper.class,           // hbase-server
// pull necessary dependencies
org.apache.zookeeper.ZooKeeper.class,
org.jboss.netty.channel.ChannelFactory.class,
com.google.protobuf.Message.class,
com.google.common.collect.Lists.class,
org.cloudera.htrace.Trace.class);
}

public static void addDependencyJars(Configuration conf,
Class<?>... classes) throws IOException {
Path path = findOrCreateJar(clazz, localFs, packagedClasses);
conf.set("tmpjars", StringUtils.arrayToString(jars.toArray(new String[jars.size()])));
}
  此时tmpjars例如

file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-client-0.96.1.1-hadoop2.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-server-0.96.1.1-hadoop2.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/htrace-core-2.01.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-common-0.96.1.1-hadoop2.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/guava-12.0.1.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hadoop-common-2.2.0.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-protocol-0.96.1.1-hadoop2.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-hadoop-compat-0.96.1.1-hadoop2.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/netty-3.6.6.Final.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/protobuf-java-2.5.0.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hadoop-mapreduce-client-core-2.2.0.jar,file:/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/zookeeper-3.4.5.jar
  JobSubmitter的copyAndConfigureFiles方法

String libjars = conf.get("tmpjars");
if (libjars != null) {
FileSystem.mkdirs(jtFs, libjarsDir, mapredSysPerms);
String[] libjarsArr = libjars.split(",");
for (String tmpjars: libjarsArr) {
Path tmp = new Path(tmpjars);
Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, replication);
DistributedCache.addFileToClassPath(
new Path(newPath.toUri().getPath()), conf);
}
}
  copyRemoteFiles会copies 这些jar to the jobtracker filesystem and returns the path where itwas copied to。
  当集群环境运行时,就会返回

[hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hbase-client-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hbase-server-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/htrace-core-2.01.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hbase-common-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/guava-12.0.1.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hadoop-common-2.2.0.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hbase-protocol-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hbase-hadoop-compat-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/netty-3.6.6.Final.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/protobuf-java-2.5.0.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/hadoop-mapreduce-client-core-2.2.0.jar, hdfs://192.168.1.200:9000/tmp/hadoop-yarn/staging/root/.staging/job_1396339976222_0035/libjars/zookeeper-3.4.5.jar]
  如果是本地运行时,则返回

[hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-client-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-server-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/htrace-core-2.01.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-common-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/guava-12.0.1.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hadoop-common-2.2.0.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-protocol-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hbase-hadoop-compat-0.96.1.1-hadoop2.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/netty-3.6.6.Final.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/protobuf-java-2.5.0.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/hadoop-mapreduce-client-core-2.2.0.jar, hdfs://192.168.1.200:9000/D:/GoogleCode/platform-components/trunk/SourceCode/study-hadoop/lib/zookeeper-3.4.5.jar]
  后面会使用Hadoop文件系统检查这两批URL。问题就在这里,它没有区分是本地Window文件系统还是集群Hadoop文件系统,应该区分检查。所以提交到集群运行没问题,本地运行出现上述问题。找个时间去Hadoop Jira上create a issue。
四. 代码能跑下去的解决方法
  在TableMapReduceUtil里initTableMapperJob,initTableReducerJob都有大量的重构方法,其中可以指定参数

   * @param addDependencyJars upload HBase jars and jars for any of the configured
*           job classes via the distributed cache (tmpjars).
  也正是因为addDependencyJars默认为true,才触发了上面的错误

if (addDependencyJars) {
addDependencyJars(job);
}
  所以我们可以将其设置为false。修改https://github.com/larsgeorge/hbase-book/blob/master/ch07/src/main/java/mapreduce/ParseJson.java 代码为

TableMapReduceUtil.initTableMapperJob(input, scan, ParseMapper.class, // co ParseJson-3-SetMap Setup map phase details using the utility method.
ImmutableBytesWritable.class, Put.class, job, false);
TableMapReduceUtil.initTableReducerJob(output, // co ParseJson-4-SetReduce Configure an identity reducer to store the parsed data.
IdentityTableReducer.class, job, null, null, null, null, false);
  运行正常,查看结果,testtable data:json的数据划分为 testtable data:column1 data:column2...符合期望。

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-312907-1-1.html 上篇帖子: hadoop解除 "Name node is in safe mode" 下篇帖子: 类Hadoop的高效分布式计算系统Spark
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表