Hadoop + Eclipse环境搭建过程

hyadijxp · 发表于 2015-7-12 12:56:56

　　http://blog.sina.com.cn/s/blog_537770820100bxmf.html
　　http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
　　http://hadoop.apache.org/common/docs/current/single_node_setup.html
　　hadoop单机配置过程：
　　单机基本配置信息：ubuntu10.10 JDK1.6  hadoop-0.21.0  Eclipse
　　1、Ubuntu在安装时候需要把SSH Server安装上，这样免得你以后安装SSH软件比较麻烦，因为Hadoop是通过SSH来与各    个机器通讯的。安装时候记得不不要安装Ubuntu自带的JDK，也就是Virual Machine Host（BSD OpenJDK），没用，我们需要的SUN JDK。
　　2 如果你没有安装SSH Server那么可以用以下命令来安装：
sudo apt-get install openssh-server openssh-client
停止ssh: /etc/init.d/ssh stop
启动ssh: /etc/init.d/ssh start
重启ssh: /etc/init.d/ssh restart
安装了ssh，你也可以用SecureCRT访问Ubuntu比直接登录方便些。
　　在每台机器上建.ssh文件夹（用root账户登录后，建在/root/.ssh/目录下）：
　　 $mkdir
.ssh
　　在ubuntu01上成密钥对：
　　    $ssh-keygen
-t rsa
　　    一路回车就可生成生成密钥对（id_rsa，id_rsa.pub）。该密钥对在/root/.ssh目录下,如果需要查看，要选择显示隐藏       文件。
　　    然后将每台机器生成的id_rsa.pub的内容复制authorized_keys文件中(id_rsa.pub 文件的内容是长长的一行，复制时不       要遗漏字符或混入了多余换行符)：
　　    $cd .ssh
　　 $cp id_rsa.pub authorized_keys
　　把authorized_keys再分别拷贝到ubuntu01-ubuntu03上：
　　 $scp authorized_keys ubuntu02:/root/.ssh
　　此处的scp就是通过ssh进行远程copy，此处需要输入远程主机的密码，即ubuntu02机器上hadoop帐户的密码，当然，也可以用其他方法将authorized_keys文件拷贝到其他机器上。
　　在每台机器上执行：
　　    $chmod 640 authorized_keys
　　    至此各个机器上的SSH配置已经完成，可以测试一下了，比如ubuntu01向ubuntu02发起ssh连接。
　　 $ssh  ubuntu02
　　如果ssh配置好了，就会出现以下提示信息
　　The
authenticity of host [ubuntu02] can't be established.
　　Key
fingerprint is 1024 5f:a0:0b:65:d3:82:df:ab:44:62:6d:98:9c:fe:e9:52.
　　Are
you sure you want to continue connecting (yes/no)?
　　因为是第一次登录这台主机。键入“yes”。第二次访问这台主机的时候就不会再显示这条提示信息了。

　　3.下面可以开始安装JDK了
　　命令是：
sudo apt-get install sun-java6-jdk，如果你不确定是不是已经安装过了JDK，可以用命令：java -version来看看。
如果说，java version不是sun的，或者是说java不是内部命令，那么就需要安装了。或者可以下载JDK直接安装
　　值得一提的是，配置环境变量，一般的安装JDK是默认是安装到/usr/lib/jvm/java-6-sun下面的，包括可执行程序以及类库都在这下面，你可以用cd /usr/lib/jvm/java-6-sun命令查看一下。
　　我配置了两个地方，一个是/etc/environment文件，一个是~/.bashrc文件，分别是这样的：
/etc/environment文件：
CLASSPATH=/usr/lib/jvm/java-6-sun/lib
JAVA_HOME=/usr/lib/jvm/java-6-sun
　　~/.bashrc的最末行加上
export JAVA_HOME=/usr/lib/jvm/java-6-sun
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=.:$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
　　注意：PATH中的$PATH一定要加上，不然你的所有命令如vi，sudo都找不到了。都需要加上/sbin/才能执行。
还有，linux下的配置分隔符是“:”，与Windows下的“;”不一样，这对新手来说尤其重要。
　　添加完了这些变量，你可以用echo名来查看一下是不是正确的，命令如下：
echo $PATH
echo $CLASSPATH
echo $JAVA_HOME
　　自己可以看看，是不是与设置的一样。
　　4、Hadoop下的conf文件下的几个文件进行配置
　　conf/core-site.xml:

fs.default.name
hdfs://localhost:9000

conf/hdfs-site.xml:


dfs.replication
1

　　conf/mapred-site.xml:
　　


mapred.job.tracker
localhost:9001

在文件conf/hadoop-env.sh中加入JDK的路径
export JAVA_HOME=/usr/lib/jvm/jdk1.6.0_10
5、执行过程：
　　Format a new distributed-filesystem:
$ bin/hadoop namenode -format

　　Start the hadoop daemons:
$ bin/start-all.sh

　　The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).

　　Browse the web interface for the NameNode and the JobTracker; by default they are available at:

NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/

　　Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input

　　Run some of the examples provided:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'

　　Examine the output files:

　　Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*

　　or

　　View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*

　　When you're done, stop the daemons with:
$ bin/stop-all.sh

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] Hadoop + Eclipse环境搭建过程

浏览过的版块

扫码加入运维网微信交流群