分布式文件系统Hadoop
官方文档地址http://hadoop.apache.org/common/docs/r1.0.3/http://www.tbdata.org/
下载到:jdk-6u26-linux-x64.bin and hadoop-1.0.3.tar.gz
它有三种模式:
Local (Standalone) Mode #本地节点
Pseudo-Distributed Mode #伪分布式
Fully-Distributed Mode #全分布式
首先用单节点做一种伪分布式的架构
[*]chmod +x jdk-6u26-linux-x64.bin
[*]./jdk-6u26-linux-x64.bin
[*]mv jdk1.6.0_26/ /usr/local/jdk
[*]vim .bash_profile
[*]PATH=$PATH:$HOME/bin:/usr/local/jdk/bin
[*]
[*]source .bash_profile
[*]useradd yejk
[*]passwd yejk
[*]cd /home/yejk
[*]vim .bash_profile
[*]PATH=$PATH:$HOME/bin:/usr/local/jdk/bin
[*]
[*]source .bash_profile
[*]cp hadoop-1.0.3.tar.gz /home/yejk/
[*]su - yejk
[*]tar zxf hadoop-1.0.3.tar.gz
[*]cd hadoop-1.0.3
[*]修改一些配置文件
[*]vim conf/hadoop-env.sh
[*]# The java implementation to use.Required.
[*]export JAVA_HOME=/usr/local/jdk
[*]vim conf/core-site.xml:
[*]
[*]
[*] fs.default.name
[*] hdfs://localhost:9000
[*]
[*]
[*]
[*]vim conf/hdfs-site.xml:
[*]
[*]
[*] dfs.replication
[*] 1
[*]
[*]
[*]
[*]vim conf/mapred-site.xml:
[*]
[*]
[*] mapred.job.tracker
[*] localhost:9001
[*]
[*]
建立ssh无密码访问
[*]ssh-keygen #一路回车
[*]ssh-copy-id -i ~/.ssh/id_rsa.pub localhost
格式化一个新的dfs文件系统:
[*]bin/hadoop namenode -format
[*]***************
[*]2/06/03 07:04:49 INFO common.Storage: Storage directory /tmp/hadoop-yejk/dfs/name has been successfully formatted.
[*]*****************
启动hadoop:
[*]bin/start-all.sh
NameNode : http://localhost:50070/
JobTracker :http://localhost:50030/
在文件系统中新建一个目录
[*]bin/hadoop fs -mkdir test
将conf文件中的数据复制上传到刚刚建立的文件夹中:
[*]bin/hadoop fs -put conf test
[*]$ bin/hadoop fs -du
[*]Found 1 items
[*]54816 hdfs://localhost:9000/user/yejk/test
[*]$ bin/hadoop fs -ls
[*]Found 1 items
[*]drwxr-xr-x - yejk supergroup 0 2012-06-03 07:19 /user/yejk/test
用自带的一个程序进行测试
[*]bin/hadoop jar hadoop-examples-1.0.3.jar grep test/* output 'dfs+'
意为使用这个java程序从上传到dfs里的test文件夹里的所有数据中搜索以dfs开头的关键字并统计排序,并把结果保存在output中
产看结果:
[*]$ bin/hadoop fs -cat output/*
[*]2 dfs.replication
[*]2 dfs.server.namenode.
[*]2 dfsadmin
[*]cat: File does not exist: /user/yejk/output/_logs
或者可以:
[*]bin/hadoop fs -get output output
[*]$ cat part-00000
[*]2 dfs.replication
[*]2 dfs.server.namenode.
[*]2 dfsadmin
页:
[1]