在Linux下配置hadoop

jixuaa · 发表于 2015-7-13 08:02:01

最近学习了一下Hadoop，在这分享一下自己的经验。

  我先是在Windows 7，Vista， XP下用CYGWin进行配置的，但是由于问题不断，所以转战到Ubuntu9.10。

  下面讲解在Ubuntu9.10下的配置过程。

  1.配置SSH

  sudo apt-get install openssh-server

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

  ssh localhost

  成功之后进入第二步。

  2.将Hadoop压缩包解压到主文件夹下，然后我们就来配置了。

  2.1单机模式(Stand-alone Mode )

$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*

  2.2伪分布式模式(Pseudo-Distributed Mode)

  关键部分到了，你的jdk都应该安装和配置好，这里就不说了。

  由于我给项目组里提交的是英文说明文档，所以就不翻译成中文了。

  记住，bin/start-all.sh之后，运行程序完了之后，就bin/stop-all.sh，我的机子如果不这样，重启或者关机就会没反应。

1.In the file conf/hadoop-env.sh  -> Set the JAVA_HOME
2.In the file conf/core-site.xml, configure it as below:


fs.default.name
hdfs://localhost:9000


3.In the file  conf/hdfs-site.xml, configure it as below:


dfs.name.dir
/home/yourname/hadoopfs/name


dfs.data.dir
/home/yourname/hadoopfs/data


dfs.replication
1


4.In the file conf/mapred-site.xml, configure it as below:


mapred.job.tracker
localhost:9001


5.Format a new DFS:
$ bin/hadoop namenode -format
6.Start the Daemon process:
$ bin/start-all.sh
7.Experiment:
$ bin/hadoop fs -mkdir input
$ bin/hadoop fs -put conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ bin/hadoop fs -cat output/*
$ bin/stop-all.sh

  恩，它自带有个WordCount的例子。

  http://localhost:50070

  http://localhost:50030

  这两个地址是查看Namenode和Datanode的。

  3集群分布式(Fully-Distributed Mode)

  首先配置多台机子SSH免密码登录，见SSH免密码登录

  注意：每台机器保证都有相同的用户名，可以新建用户，新建的步骤在这就不说了，属于Linux的东西；或者你在装其他机器时就使用相同的登机用户名。

  把主机按照下面步骤配置之后，把conf里面的文件都拷贝一份至Slave机器。

  在下面步骤中的第六条，英文表达可能不清楚，意思就是记得修改/etc/hosts，/etc/hosts文件中的主机名一定要是机器名。

  再次提醒，记得配置完后各个机器拷贝一份。

1.In the file conf/core-site.xml, configure it as below:


fs.default.name
hdfs://[Master's IPV4]:9000


2.In the file  conf/hdfs-site.xml, configure it as below:


dfs.name.dir
/home/yourname/hadoopfs/name


dfs.data.dir
/home/yourname/hadoopfs/data


dfs.replication
1


3.In the file conf/mapred-site.xml, configure it as below:


mapred.job.tracker
[Master's IPV4]:9001


4.Modify the files of conf/Masters and conf/Slaves, add computer's IPV4 into these files.
5.Disable IPV6 (Search on the Internet)
6.In the file etc/Hosts of the Slave Computers, the Master's name must be the computer's name(for example, wang@wang-desktop, hadoop@clock-PC).
7.Use Eclipse 3.3

  至此完结，写的比较笼统，也希望大家能给出好的意见。

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] 在Linux下配置hadoop

浏览过的版块

扫码加入运维网微信交流群