Hadoop版Helloworld之wordcount运行示例

cf2000 · 发表于 2017-12-17 20:07:23

　　1.编写一个统计单词数量的java程序，并命名为wordcount.java，代码如下：
　　

import java.io.IOException;　　

import java.util.StringTokenizer;　　

　　

import org.apache.hadoop.conf.Configuration;　　

import org.apache.hadoop.fs.Path;　　

import org.apache.hadoop.io.IntWritable;　　

import org.apache.hadoop.io.Text;　　

import org.apache.hadoop.mapreduce.Job;　　

import org.apache.hadoop.mapreduce.Mapper;　　

import org.apache.hadoop.mapreduce.Reducer;　　

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;　　

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;　　

　　

public>
　　

public static>
extends Mapper<Object, Text, Text, IntWritable>{　　

　　

private final static IntWritable one = new IntWritable(1);　　

private Text word = new Text();　　

　　

public void map(Object key, Text value, Context context　　
)
throws IOException, InterruptedException {　　
StringTokenizer itr
= new StringTokenizer(value.toString());　　

while (itr.hasMoreTokens()) {　　
word.set(itr.nextToken());
　　
context.write(word, one);
　　
}
　　
}
　　
}
　　

　　

public static>
extends Reducer<Text,IntWritable,Text,IntWritable> {　　

private IntWritable result = new IntWritable();　　

　　

public void reduce(Text key, Iterable<IntWritable> values,　　
Context context
　　
)
throws IOException, InterruptedException {　　

int sum = 0;　　

for (IntWritable val : values) {　　
sum
+= val.get();　　
}
　　
result.set(sum);
　　
context.write(key, result);
　　
}
　　
}
　　

　　

public static void main(String[] args) throws Exception {　　
Configuration conf
= new Configuration();　　
Job job
= Job.getInstance(conf, "word count");　　
job.setJarByClass(WordCount.
class);　　
job.setMapperClass(TokenizerMapper.
class);　　
job.setCombinerClass(IntSumReducer.
class);　　
job.setReducerClass(IntSumReducer.
class);　　
job.setOutputKeyClass(Text.
class);　　
job.setOutputValueClass(IntWritable.
class);　　
FileInputFormat.addInputPath(job,
new Path(args[0]));　　
FileOutputFormat.setOutputPath(job,
new Path(args[1]));　　
System.exit(job.waitForCompletion(
true) ? 0 : 1);　　
}
　　
}
　　

　　2.声明java环境变量：
　　

export JAVA_HOME=/usr/java/default　　
export PATH
=${JAVA_HOME}/bin:${PATH}　　
export HADOOP_CLASSPATH
=${JAVA_HOME}/lib/tools.jar　　

　　其中JAVA_HOME根据自己安装java的实际路径进行配置。
　　注意：如果不声明以上环境变量，那么在以后运行时，将会收到错误提示：

　　3.编译并创建jar包。
　　

bin/hadoop com.sun.tools.javac.Main WordCount.java　　
jar cf wc.jar WordCount
*.class　　

　　4.运行第三步骤生成的wc.jar包。此时要注意，output文件夹不要手工创建，系统运行后会自动创建。
　　

bin/hadoop jar wc.jar WordCount /user/root/wordcount/input /user/root/wordcount/output　　

　　正常运行结束后，会在outPut文件夹下生成part-r-00000及__SUCCESS两个文件，其中part-r-00000存储分析结果。运行命令：
　　

bin/hadoop fs -cat /user/root/wordcount/output/part-r-00000　　

　　即可查看分析结果，如下图所示：

　　至此，本示例完成。

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] Hadoop版Helloworld之wordcount运行示例

浏览过的版块

扫码加入运维网微信交流群