设为首页 收藏本站
查看: 716|回复: 0

[经验分享] jvm+windows+cygwin+eclipse+hadoop配置篇

[复制链接]

尚未签到

发表于 2016-12-9 07:50:05 | 显示全部楼层 |阅读模式
  1 整个过程视频教程:http://v.youku.com/v_show/id_XMzc5MzM1NDQw.html
  下载地址:http://pan.baidu.com/share/link?shareid=211927&uk=1678594189
  2 cygwin的下载网址:http://www.cygwin.com
  3 cygwin的vim设置:http://blog.163.com/xjx_user/blog/static/21493137720130104037220/
  注意".vimrc" 放在自己的目录下 首先通过cd ~ 切换到自己的目录 然以后vi .vimrc 然后设置
  截图: DSC0000.jpg  
  打开.c文件后为: DSC0001.jpg
  4 Cygwin下运行ssh-host-config(安全外壳协议,secureshell 加密后传输 一般的ftp,pop telnet是没有加密的)参考网址
  http://blog.sina.com.cn/s/blog_62adf3670101c0bw.html
  http://www.cnblogs.com/xjx-user/archive/2013/01/09/2852201.html
  登录ssh方式为:ssh localhost 就可以使用who命令了。
  5 cygin上安装gcc工具链:http://www.cnblogs.com/xjx-user/archive/2013/01/09/2852204.html
  注意,一般下载与安装要分开重做一遍。否则容易出错。即使下载完全也可能提示出错。
  6 hadoop下载地址:http://www.apache.org/dist/hadoop/core/ 
  7 在eclipse中配置hadoop插件:
  http://www.cnblogs.com/xjx-user/archive/2013/01/09/2852205.html
  8 windows7下eclipse与hadoop连接时产生的没有权限需要更改的文件hadoop-core-1.0.4.jar
  网址:http://download.csdn.net/download/snow_eagle_howard/4842134 
  免费下载地址http://pan.baidu.com/share/link?shareid=211924&uk=1678594189
  9 hadoop启动的代码:到hadoop目录下   ./start-all.sh      然后就可以在bin目录下运行./hadoop dfsadmin -report
DSC0002.jpg

  10 wordcount的代码:http://www.cnblogs.com/xjx-user/archive/2013/01/09/2852205.html
  11  wordcount个人运行结果:
DSC0003.jpg 注意 运行前要在cygwin下先启动hadoop 同时保证cygwin服务已启动 同时保证ssh可用 如果之前已经有输出文件 output/1目录已经存在 要先删除


DSC0004.gif DSC0005.gif View Code
13/01/09 01:26:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/01/09 01:26:13 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/01/09 01:26:13 INFO input.FileInputFormat: Total input paths to process : 5
13/01/09 01:26:14 WARN snappy.LoadSnappy: Snappy native library not loaded
13/01/09 01:26:14 INFO mapred.JobClient: Running job: job_local_0001
13/01/09 01:26:14 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
13/01/09 01:26:14 INFO mapred.MapTask: io.sort.mb = 100
13/01/09 01:26:14 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/09 01:26:14 INFO mapred.MapTask: record buffer = 262144/327680
13/01/09 01:26:14 INFO mapred.MapTask: Starting flush of map output
13/01/09 01:26:14 INFO mapred.MapTask: Finished spill 0
13/01/09 01:26:14 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
13/01/09 01:26:15 INFO mapred.JobClient:  map 0% reduce 0%
13/01/09 01:26:17 INFO mapred.LocalJobRunner:
13/01/09 01:26:17 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/01/09 01:26:17 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
13/01/09 01:26:17 INFO mapred.MapTask: io.sort.mb = 100
13/01/09 01:26:17 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/09 01:26:17 INFO mapred.MapTask: record buffer = 262144/327680
13/01/09 01:26:17 INFO mapred.MapTask: Starting flush of map output
13/01/09 01:26:17 INFO mapred.MapTask: Finished spill 0
13/01/09 01:26:17 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
13/01/09 01:26:18 INFO mapred.JobClient:  map 100% reduce 0%
13/01/09 01:26:20 INFO mapred.LocalJobRunner:
13/01/09 01:26:20 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
13/01/09 01:26:20 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
13/01/09 01:26:20 INFO mapred.MapTask: io.sort.mb = 100
13/01/09 01:26:20 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/09 01:26:20 INFO mapred.MapTask: record buffer = 262144/327680
13/01/09 01:26:20 INFO mapred.MapTask: Starting flush of map output
13/01/09 01:26:20 INFO mapred.MapTask: Finished spill 0
13/01/09 01:26:20 INFO mapred.Task: Task:attempt_local_0001_m_000002_0 is done. And is in the process of commiting
13/01/09 01:26:23 INFO mapred.LocalJobRunner:
13/01/09 01:26:23 INFO mapred.Task: Task 'attempt_local_0001_m_000002_0' done.
13/01/09 01:26:23 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
13/01/09 01:26:23 INFO mapred.MapTask: io.sort.mb = 100
13/01/09 01:26:23 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/09 01:26:23 INFO mapred.MapTask: record buffer = 262144/327680
13/01/09 01:26:23 INFO mapred.MapTask: Starting flush of map output
13/01/09 01:26:23 INFO mapred.MapTask: Finished spill 0
13/01/09 01:26:23 INFO mapred.Task: Task:attempt_local_0001_m_000003_0 is done. And is in the process of commiting
13/01/09 01:26:26 INFO mapred.LocalJobRunner:
13/01/09 01:26:26 INFO mapred.Task: Task 'attempt_local_0001_m_000003_0' done.
13/01/09 01:26:26 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
13/01/09 01:26:26 INFO mapred.MapTask: io.sort.mb = 100
13/01/09 01:26:26 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/09 01:26:26 INFO mapred.MapTask: record buffer = 262144/327680
13/01/09 01:26:26 INFO mapred.MapTask: Starting flush of map output
13/01/09 01:26:26 INFO mapred.MapTask: Finished spill 0
13/01/09 01:26:26 INFO mapred.Task: Task:attempt_local_0001_m_000004_0 is done. And is in the process of commiting
13/01/09 01:26:29 INFO mapred.LocalJobRunner:
13/01/09 01:26:29 INFO mapred.Task: Task 'attempt_local_0001_m_000004_0' done.
13/01/09 01:26:29 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
13/01/09 01:26:29 INFO mapred.LocalJobRunner:
13/01/09 01:26:29 INFO mapred.Merger: Merging 5 sorted segments
13/01/09 01:26:29 INFO mapred.Merger: Down to the last merge-pass, with 5 segments left of total size: 2065 bytes
13/01/09 01:26:29 INFO mapred.LocalJobRunner:
13/01/09 01:26:29 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
13/01/09 01:26:29 INFO mapred.LocalJobRunner:
13/01/09 01:26:29 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
13/01/09 01:26:29 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to /mapreduce/wordcount/output/1
13/01/09 01:26:32 INFO mapred.LocalJobRunner: reduce > reduce
13/01/09 01:26:32 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
13/01/09 01:26:33 INFO mapred.JobClient:  map 100% reduce 100%
13/01/09 01:26:33 INFO mapred.JobClient: Job complete: job_local_0001
13/01/09 01:26:33 INFO mapred.JobClient: Counters: 19
13/01/09 01:26:33 INFO mapred.JobClient:   File Output Format Counters
13/01/09 01:26:33 INFO mapred.JobClient:     Bytes Written=1485
13/01/09 01:26:33 INFO mapred.JobClient:   FileSystemCounters
13/01/09 01:26:33 INFO mapred.JobClient:     FILE_BYTES_READ=6117827
13/01/09 01:26:33 INFO mapred.JobClient:     HDFS_BYTES_READ=4960
13/01/09 01:26:33 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=6423845
13/01/09 01:26:33 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1485
13/01/09 01:26:33 INFO mapred.JobClient:   File Input Format Counters
13/01/09 01:26:33 INFO mapred.JobClient:     Bytes Read=1036
13/01/09 01:26:33 INFO mapred.JobClient:   Map-Reduce Framework
13/01/09 01:26:33 INFO mapred.JobClient:     Map output materialized bytes=2085
13/01/09 01:26:33 INFO mapred.JobClient:     Map input records=15
13/01/09 01:26:33 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/01/09 01:26:33 INFO mapred.JobClient:     Spilled Records=216
13/01/09 01:26:33 INFO mapred.JobClient:     Map output bytes=1835
13/01/09 01:26:33 INFO mapred.JobClient:     Total committed heap usage (bytes)=986734592
13/01/09 01:26:33 INFO mapred.JobClient:     SPLIT_RAW_BYTES=605
13/01/09 01:26:33 INFO mapred.JobClient:     Combine input records=0
13/01/09 01:26:33 INFO mapred.JobClient:     Reduce input records=108
13/01/09 01:26:33 INFO mapred.JobClient:     Reduce input groups=87
13/01/09 01:26:33 INFO mapred.JobClient:     Combine output records=0
13/01/09 01:26:33 INFO mapred.JobClient:     Reduce output records=87
13/01/09 01:26:33 INFO mapred.JobClient:     Map output records=108



  12 编程实现对hdfs中文件的操作
  代码:

View Code
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer itr
= new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum
+= val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf
= new Configuration();
if (args.length != 2) {
System.err.println(
"Usage: wordcount  ");
System.exit(
2);
}
Job job
= new Job(conf, "word count");
job.setJarByClass(WordCount.
class);
job.setMapperClass(TokenizerMapper.
class);
job.setReducerClass(IntSumReducer.
class);
job.setMapOutputKeyClass(Text.
class);
job.setMapOutputValueClass(IntWritable.
class);
job.setOutputKeyClass(Text.
class);
job.setOutputValueClass(IntWritable.
class);
FileInputFormat.addInputPath(job,
new Path(args[0]));
FileOutputFormat.setOutputPath(job,
new Path(args[1]));
System.exit(job.waitForCompletion(
true) ? 0 : 1);
}
}




  运行结果 DSC0006.jpg
  13 sequenceFile(顺序文件)的读写 这里只实现了写(mapfile文件的读写则类似):
  代码:

View Code
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
public class SequenceFileWriteDemo {
private static final String[] DATA=
{
"one,teo,buckle my shoe",
"Three,four,shut the door",
"Five,six,pick up sticks",
"Seven,eight,lay them straight",
"Nine,ten,a big fat hen"
};
public static void main(String[] args) throws Exception{
String uri
=args[0];
Configuration conf
=new Configuration();
FileSystem fs
=FileSystem.get(URI.create(uri),conf);
Path path
=new Path(uri);
IntWritable key
=new IntWritable();
Text value
=new Text();
SequenceFile.Writer writer
=null;
try
{
writer
=SequenceFile.createWriter(fs, conf, path,key.getClass(),value.getClass());
for(int i=0;i<100;i++)
{
key.set(
100-i);
value.set(DATA[i
%DATA.length]);
System.out.printf(
"[%s]\t%s\t%s\n",writer.getLength(),key,value);
writer.append(key, value);
}
}
finally{
IOUtils.closeStream(writer);
}
}
}




  运行eclipse结果:
DSC0007.jpg

  之后通过cygin的读命令来查看(也可以通过编程来实现查看,注意是sequencefile文件,所以直接在windwos下记事本打开会出现乱码):
DSC0008.jpg

  hadoop的网络用户界面:
  JobTracker:(http://jobtracker-host:50030),方便跟踪Job工作进程,查看工作统计和日志;http://localhost:50030/
  NameNode: (http://jobtracker-host:50070),查看NameNode的基本情况,HDFS中的内容,NameNode日志   http://localhost:50070/

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-311599-1-1.html 上篇帖子: hadoop 报错 org.apache.hadoop.mapred.TaskTracker: Process Thread Dump: lost task 下篇帖子: hadoop作业运行部分源码
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表