Hadoop on Mac with IntelliJ IDEA

234cfds1 · 发表于 2015-7-13 08:56:50

　　本文讲述使用IntelliJ IDEA时遇到Hadoop提示input path does not exist（输入路径不存在）的解决过程。
　　环境：Mac OS X 10.9.5, IntelliJ IDEA 13.1.4, Hadoop 1.2.1
　　Hadoop放在虚拟机中，宿主机通过SSH连接，IDE和数据文件在宿主机。
　　这是自学Hadoop的第三天。以前做过点.NET开发，Mac、IntelliJ IDEA、Hadoop、CentOS对我而言，相当陌生。第一份Hadoop代码就遇到了问题。
　　以下代码摘自《Hadoop In Action》第4章第1份代码。

1 public class MyJob extends Configured implements Tool {
2    public static class MapClass extends MapReduceBase
3          implements Mapper {
4       @Override
5       public void map(Text key, Text value, OutputCollector output, Reporter reporter)
6                throws IOException {
7          output.collect(value, key);
8       }
9    }
10
11
12    public static class Reduce extends MapReduceBase
13          implements Reducer {
14       @Override
15       public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {
16          String csv = "";
17          while (values.hasNext()) {
18                if (csv.length() > 0) {
19                   csv += ", ";
20                }
21                csv += values.next().toString();
22          }
23          output.collect(key, new Text(csv));
24       }
25    }
26
27    @Override
28    public int run(String[] args) throws Exception {
29       Configuration configuration = getConf();
30
31       JobConf job = new JobConf(configuration, MyJob.class);
32
33       Path in = new Path(args[0]);
34       Path out = new Path(args[1]);
35
36       FileInputFormat.setInputPaths(job, in);
37       FileOutputFormat.setOutputPath(job, out);
38
39       job.setJobName("MyJob");
40       job.setMapperClass(MapClass.class);
41       job.setReducerClass(Reduce.class);
42
43       job.setInputFormat(KeyValueTextInputFormat.class);
44       job.setOutputFormat(TextOutputFormat.class);
45       job.setOutputKeyClass(Text.class);
46       job.setOutputValueClass(Text.class);
47       job.set("key.value.separator.in.input.line", ",");
48
49       JobClient.runJob(job);
50
51       return 0;
52    }
53
54    public static void main(String[] args) {
55       try {
56          int res = ToolRunner.run(new Configuration(), new MyJob(), args);
57          System.exit(res);
58       } catch (Exception e) {
59          e.printStackTrace();
60       }
61    }
62 }
　　主函数做了异常处理，其余和原书一致。
　　直接在IDEA中执行代码，数据文件目录和书上不同，故命令行参数和原书略有差别，如下：

/Users/michael/Desktop/Hadoop/HadoopInAction/cite75_99.txt output
　　IDEA的配置如图

　　数据文件路径如图

　　以上配置无拼写错误。然后，我很高兴地按下'Run MyJob.main()' ，准备等结果，继续跟着书走。
　　悲剧了，IDEA输出input path does not exist。输入路径是/Users/michael/IdeaProjects/Hadoop/Users/michael/Desktop/Hadoop/HadoopInAction/cite75_99.txt，这不是Working directory拼上我给的第一个参数么，怎么回事。
　　整份代码，就run方法中用了Path，应该是这边的问题。
　　在FileOutputFormat.setOutputPath(job, out);后面加上System.out.println(FileInputFormat.getInputPaths(job)[0].toUri());发现输入路径真的被合并到工作路径下了。怪不得报错呢（StackOverflow中有人说是我的数据文件没提交到Hadoop才会报这个错误）。
　　现在，可以判断问题是FileInputFormat.setInputPaths(job, in);导致的。进源码看看它是怎么工作的。

  /**
* Set the array of {@link Path}s as the list of inputs
* for the map-reduce job.
*
* @param conf Configuration of the job.
* @param inputPaths the {@link Path}s of the input directories/files
* for the map-reduce job.
*/
public static void setInputPaths(JobConf conf, Path... inputPaths) {
Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]);
StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString()));
for(int i = 1; i < inputPaths.length;i++) {
str.append(StringUtils.COMMA_STR);
path = new Path(conf.getWorkingDirectory(), inputPaths);
str.append(StringUtils.escapeString(path.toString()));
}
conf.set("mapred.input.dir", str.toString());
}
　　可以看到，源码第一句就是合并conf和inputPaths。既然合并了工作路径，那就把它去掉好了。
　　在FileInputFormat.setInputPaths(job, in);前保存合并前结果
　　Path workingDirectoryBak = job.getWorkingDirectory();
　　再设置为根目录
　　job.setWorkingDirectory(new Path("/"));
　　然后在它后面设置回来
　　job.setWorkingDirectory(workingDirectoryBak);
　　加上输出，确认操作结果
　　System.out.println(FileInputFormat.getInputPaths(job)[0].toUri());
　　新代码如下，mac下的输入法不好用，直接中式英语写注释

1 public int run(String[] args) throws Exception {
2       Configuration configuration = getConf();
3
4       JobConf job = new JobConf(configuration, MyJob.class);
5
6       Path in = new Path(args[0]);
7       Path out = new Path(args[1]);
8
9       // backup current directory, namely /Users/michael/IdeaProjects/Hadoop where source located
10       Path workingDirectoryBak = job.getWorkingDirectory();
11       // set to root dir
12       job.setWorkingDirectory(new Path("/"));
13       // let it combine root and input path
14       FileInputFormat.setInputPaths(job, in);
15       // set it back
16       job.setWorkingDirectory(workingDirectoryBak);
17       // print to confirm
18       System.out.println(FileInputFormat.getInputPaths(job)[0].toUri());
19
20       FileOutputFormat.setOutputPath(job, out);
21
22       job.setJobName("MyJob");
23       job.setMapperClass(MapClass.class);
24       job.setReducerClass(Reduce.class);
25
26       job.setInputFormat(KeyValueTextInputFormat.class);
27       job.setOutputFormat(TextOutputFormat.class);
28       job.setOutputKeyClass(Text.class);
29       job.setOutputValueClass(Text.class);
30       job.set("key.value.separator.in.input.line", ",");
31
32       JobClient.runJob(job);
33
34       return 0;
35    }
　　再试一次，正常，将近1分钟执行完，配置差就是这样。

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] Hadoop on Mac with IntelliJ IDEA

扫码加入运维网微信交流群