gdx 发表于 2018-10-31 10:07:07

CentOS6.4+hadoop-0.20.2安装实录

  CentOS6.4+hadoop-0.20.2安装实录
  下载网址:
  http://mirror.bit.edu.cn/apache/
  资料网址
  http://hadoop.apache.org/docs/
  节点规划:
  10.10.1.131hadoop1
  10.10.1.132hadoop2
  10.10.1.133hadoop3
  10.10.1.134dog
  10.10.1.135cat
  10.10.1.136gangster
  一、解压安装
  在主节点上做
  验证JDK是否安装
  # java -version
  java version "1.7.0_09-icedtea"
  OpenJDK Runtime Environment (rhel-2.3.4.1.el6_3-x86_64)
  OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)
  验证SSH是否安装
  # ssh -version
  OpenSSH_5.3p1, OpenSSL 1.0.0-fips 29 Mar 2010
  Bad escape character 'rsion'.
  各节点上都做
  #vi /etc/hosts
  10.10.1.131hadoop1
  10.10.1.132hadoop2
  10.10.1.133hadoop3
  10.10.1.134dog
  10.10.1.135cat
  10.10.1.136gangster
  # useradd hadoop
  # passwd hadoop
  # vi /etc/sysconfig/iptables
  -A INPUT -s 10.10.1.131 -j ACCEPT
  -A INPUT -s 10.10.1.132 -j ACCEPT
  -A INPUT -s 10.10.1.133 -j ACCEPT
  -A INPUT -s 10.10.1.170 -j ACCEPT
  -A INPUT -s 10.10.1.171 -j ACCEPT
  -A INPUT -s 10.10.1.172 -j ACCEPT
  二、安装hadoop
  把下载的hadoop-0.20.2.tar.gz文件上传到/home/hadoop目录下
  $ tar xzvf hadoop-0.20.2.tar.gz
  -rw-r--r--.1 hadoop hadoop 44575568 Feb 16 21:34 hadoop-0.20.2.tar.gz
  $ cd hadoop-0.20.2
  $ ll
  total 4872
  drwxr-xr-x.2 hadoop hadoop    4096 Feb 16 21:37 bin
  -rw-rw-r--.1 hadoop hadoop   74035 Feb 192010 build.xml
  drwxr-xr-x.4 hadoop hadoop    4096 Feb 192010 c++
  -rw-rw-r--.1 hadoop hadoop348624 Feb 192010 CHANGES.txt
  drwxr-xr-x.2 hadoop hadoop    4096 Feb 16 21:37 conf
  drwxr-xr-x. 13 hadoop hadoop    4096 Feb 192010 contrib
  drwxr-xr-x.7 hadoop hadoop    4096 Feb 16 21:37 docs
  -rw-rw-r--.1 hadoop hadoop    6839 Feb 192010 hadoop-0.20.2-ant.jar
  -rw-rw-r--.1 hadoop hadoop 2689741 Feb 192010 hadoop-0.20.2-core.jar
  -rw-rw-r--.1 hadoop hadoop142466 Feb 192010 hadoop-0.20.2-examples.jar
  -rw-rw-r--.1 hadoop hadoop 1563859 Feb 192010 hadoop-0.20.2-test.jar
  -rw-rw-r--.1 hadoop hadoop   69940 Feb 192010 hadoop-0.20.2-tools.jar
  drwxr-xr-x.2 hadoop hadoop    4096 Feb 16 21:37 ivy
  -rw-rw-r--.1 hadoop hadoop    8852 Feb 192010 ivy.xml
  drwxr-xr-x.5 hadoop hadoop    4096 Feb 16 21:37 lib
  drwxr-xr-x.2 hadoop hadoop    4096 Feb 16 21:37 librecordio
  -rw-rw-r--.1 hadoop hadoop   13366 Feb 192010 LICENSE.txt
  -rw-rw-r--.1 hadoop hadoop   101 Feb 192010 NOTICE.txt
  -rw-rw-r--.1 hadoop hadoop    1366 Feb 192010 README.txt
  drwxr-xr-x. 15 hadoop hadoop    4096 Feb 16 21:37 src
  drwxr-xr-x.8 hadoop hadoop    4096 Feb 192010 webap
  三、配置主节点hadoop用户无密码访问从节点
  在MASTER节点上生成MASTER节点的ssh密钥对
  #su - hadoop
  $ssh-keygen -t rsa
  Generating public/private rsa key pair.
  Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
  Created directory '/home/hadoop/.ssh'.
  Enter passphrase (empty for no passphrase):
  Enter same passphrase again:
  Your identification has been saved in /home/hadoop/.ssh/id_rsa.
  Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
  The key fingerprint is:
  b9:5c:48:74:25:33:ac:9f:11:c9:77:5e:02:43:3b:ba hadoop@hadoop1.cfzq.com
  The key's randomart image is:
  +--[ RSA 2048]----+
  |      .o=+=.   |
  |       . .=+.oo .|
  |      .. ooo o |
  |       ..o.. ..|
  |      S.oo   |
  |       . oo.   |
  |      o E      |
  |               |
  |               |
  +-----------------+
  把公钥拷贝到各SLAVE节点上
  $ scp id_rsa.pub hadoop@hadoop2:~/master-key
  The authenticity of host 'hadoop2 (10.10.1.132)' can't be established.
  RSA key fingerprint is f9:47:3e:59:39:10:cd:7d:a4:5c:0d:ab:df:1f:14:21.
  Are you sure you want to continue connecting (yes/no)? yes
  Warning: Permanently added 'hadoop2,10.10.1.132' (RSA) to the list of known hosts.
  hadoop@hadoop2's password:
  id_rsa.pub                                                                                                                100%405   0.4KB/s   00:00
  $ scp id_rsa.pub hadoop@hadoop3:~/master-key
  The authenticity of host 'hadoop3 (10.10.1.133)' can't be established.
  RSA key fingerprint is f9:47:3e:59:39:10:cd:7d:a4:5c:0d:ab:df:1f:14:21.
  Are you sure you want to continue connecting (yes/no)? yes
  Warning: Permanently added 'hadoop3,10.10.1.133' (RSA) to the list of known hosts.
  hadoop@hadoop3's password:
  id_rsa.pub                                                                                                                100%405   0.4KB/s   00:00
  $ scp id_rsa.pub hadoop@cat:~/master-key
  The authenticity of host 'cat (10.10.1.171)' can't be established.
  RSA key fingerprint is f9:47:3e:59:39:10:cd:7d:a4:5c:0d:ab:df:1f:14:21.
  Are you sure you want to continue connecting (yes/no)? yes
  Warning: Permanently added 'cat,10.10.1.171' (RSA) to the list of known hosts.
  hadoop@cat's password:
  id_rsa.pub                                                                                                                100%405   0.4KB/s   00:00
  $ scp id_rsa.pub hadoop@dog:~/master-key
  The authenticity of host 'dog (10.10.1.170)' can't be established.
  RSA key fingerprint is f9:47:3e:59:39:10:cd:7d:a4:5c:0d:ab:df:1f:14:21.
  Are you sure you want to continue connecting (yes/no)? yes
  Warning: Permanently added 'dog,10.10.1.170' (RSA) to the list of known hosts.
  hadoop@dog's password:
  id_rsa.pub                                                                                                                100%405   0.4KB/s   00:00
  $ scp id_rsa.pub hadoop@gangster:~/master-key
  The authenticity of host 'gangster (10.10.1.172)' can't be established.
  RSA key fingerprint is f9:47:3e:59:39:10:cd:7d:a4:5c:0d:ab:df:1f:14:21.
  Are you sure you want to continue connecting (yes/no)? yes
  Warning: Permanently added 'gangster,10.10.1.172' (RSA) to the list of known hosts.
  hadoop@gangster's password:
  id_rsa.pub                                                                                                                100%405   0.4KB/s   00:00
  在各SLAVE节点上:
  $ mkdir .ssh
  $ chmod 700 .ssh/
  $ mv master-key .ssh/authorized_keys
  $ cd .ssh/
  $ chmod 600 authorized_keys
  $ mkdir .ssh
  $ chmod 700 .ssh/
  $ mv master-key .ssh/authorized_keys
  $ cd .ssh/
  $ chmod 600 authorized_keys
  $ mkdir .ssh
  $ chmod 700 .ssh/
  $ mv master-key .ssh/authorized_keys
  $ cd .ssh/
  $ chmod 600 authorized_keys
  $ mkdir .ssh
  $ chmod 700 .ssh/
  $ mv master-key .ssh/authorized_keys
  $ cd .ssh/
  $ chmod 600 authorized_keys
  $ mkdir .ssh
  $ chmod 700 .ssh/
  $mv master-key .ssh/authorized_keys
  $ cd .ssh/
  $ chmod 600 authorized_keys
  在MASTER节点上测试主节点无口令访问各从节点
  $ ssh hadoop2
  Last login: Sat Feb 15 19:35:21 2014 from hadoop1
  $ exit
  logout
  Connection to hadoop2 closed.
  $ ssh hadoop3
  Last login: Sat Feb 15 21:57:38 2014 from hadoop1
  $ exit
  logout
  Connection to hadoop3 closed.
  $ ssh cat
  Last login: Sat Feb 15 14:33:50 2014 from hadoop1
  $ exit
  logout
  Connection to cat closed.
  $ ssh dog
  Last login: Sun Feb 16 20:41:19 2014 from hadoop1
  $ exit
  logout
  Connection to dog closed.
  $ ssh gangster
  Last login: Sat Feb 15 18:03:45 2014 from hadoop1
  $ exit
  logout
  Connection to gangster closed.
  二、配置运行参数
  几个主要参数要进行配置
  1.fs.default.name
  2.hadoop.tmp.dir
  3.mapred.job.tracker
  4.dfs.name.dir
  5.dfs.data.dir
  6.dfs.http.address
  # su - hadoop
  $ cd hadoop-0.20.2/conf/
  $ vi hadoop-env.sh
  export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64
  export HADOOP_LOG_DIR=/var/log/hadoop
  在各节点:
  # chmod 777 /var/log/hadoop
  # chmod 777 /var/log/hadoop
  # chmod 777 /var/log/hadoop
  测试JAVA_HOME设置是否正确
  $ ./hadoop version
  Hadoop 0.20.2
  Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707
  Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010
  $ vi core-site.xml
  
  
  ##配置地址和端口
  fs.default.name
  hdfs://10.10.1.131:9000
  
  
  hdfs-site.xml中
  dfs.http.address提供web页面显示的地址和端口默认是50070,ip是namenode的ip
  dfs.data.dir是datanode机器上data数据存放的位置,没有则放到core-site.xml的tmp目录中
  dfs.name.dir是namenode机器上name数据存放的位置,没有则放到core-site.xml的tmp目录中
  $ pwd
  /home/hadoop/hadoop-0.20.2
  $ mkdir data
  $ mkdir name
  $ mkdir tmp
  $ vi hdfs-site.xml
  
  
  dfs.name.dir
  /home/hadoop/hadoop-0.20.2/name
  
  
  dfs.data.dir
  /home/hadoop/hadoop-0.20.2/data
  
  
  dfs.http.address
  10.10.1.131:50071
  
  
  dfs.replication
  2
  The actual number of relications can be specified when the file is created.
  
  
  $ vi mapred-site.xml
  
  
  mapred.job.tracker
  10.10.1.131:9001
  The host and port that the MapReduce job tracker runs at.
  
  $ vi masters
  hadoop1
  $ vi slaves
  hadoop2
  hadoop3
  四、向各datanodes节点复制hadoop
  $ pwd
  /home/hadoop
  $ scp -r hadoop-0.20.2 hadoop2:/home/hadoop/.
  $ scp -r hadoop-0.20.2 hadoop3:/home/hadoop/.
  五、格式化分布式文件系统
  在namenode上
  $ pwd
  /home/hadoop/hadoop-0.20.2/bin
  $ ./hadoop namenode -format
  14/02/18 20:23:24 INFO namenode.NameNode: STARTUP_MSG:
  /************************************************************
  STARTUP_MSG: Starting NameNode
  STARTUP_MSG:   host = hadoop1.cfzq.com/10.10.1.131
  STARTUP_MSG:   args = [-format]
  STARTUP_MSG:   version = 0.20.2
  STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
  ************************************************************/
  14/02/18 20:23:38 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
  14/02/18 20:23:38 INFO namenode.FSNamesystem: supergroup=supergroup
  14/02/18 20:23:38 INFO namenode.FSNamesystem: isPermissionEnabled=true
  14/02/18 20:23:39 INFO common.Storage: Image file of size 96 saved in 0 seconds.
  14/02/18 20:23:39 INFO common.Storage: Storage directory /home/hadoop/hadoop-0.20.2/data has been successfully formatted.
  14/02/18 20:23:39 INFO namenode.NameNode: SHUTDOWN_MSG:
  /************************************************************
  SHUTDOWN_MSG: Shutting down NameNode at hadoop1.cfzq.com/10.10.1.131
  ************************************************************/
  六、启动守护进程
  $ pwd
  /home/hadoop/hadoop-0.20.2/bin
  $ ./start-all.sh
  starting namenode, logging to /var/log/hadoop/hadoop-hadoop-namenode-hadoop1.cfzq.com.out
  hadoop3: starting datanode, logging to /var/log/hadoop/hadoop-hadoop-datanode-hadoop3.cfzq.com.out
  hadoop2: starting datanode, logging to /var/log/hadoop/hadoop-hadoop-datanode-hadoop2.cfzq.com.out
  hadoop1: starting secondarynamenode, logging to /var/log/hadoop/hadoop-hadoop-secondarynamenode-hadoop1.cfzq.com.out
  starting jobtracker, logging to /var/log/hadoop/hadoop-hadoop-jobtracker-hadoop1.cfzq.com.out
  hadoop3: starting tasktracker, logging to /var/log/hadoop/hadoop-hadoop-tasktracker-hadoop3.cfzq.com.out
  hadoop2: starting tasktracker, logging to /var/log/hadoop/hadoop-hadoop-tasktracker-hadoop2.cfzq.com.out
  在各节点上检查进程启动情况
  $ /usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/jps
  30042 Jps
  29736 NameNode
  29885 SecondaryNameNode
  29959 JobTracker
  $ jps
  13437 TaskTracker
  13327 DataNode
  13481 Jps
  $ jps
  12117 Jps
  11962 DataNode
  12065 TaskTracker
  七、检测hadoop群集
  $ pwd
  /home/hadoop/hadoop-0.20.2
  $ mkdir input
  $ cd input
  $ echo "hello world" >test1.txt
  $ echo "hello hadoop" >test2.txt
  $ ll
  total 8
  -rw-rw-r--. 1 hadoop hadoop 12 Feb 18 09:32 test1.txt
  -rw-rw-r--. 1 hadoop hadoop 13 Feb 18 09:33 test2.txt
  把input目录下的文件拷贝到hadoop系统里面去
  $ cd ../bin/
  $ pwd
  /home/hadoop/hadoop-0.20.2/bin
  $ ./hadoop dfs -put ../input in
  dfs参数:指示进行分布式文件系统的操作
  -put参数:把操作系统的文件输入到分布式文件系统
  ../input参数:操作系统的文件目录
  in参数:分布式文件系统的文件目录
  $ ./hadoop dfs -ls ./in/*
  -rw-r--r--   2 hadoop supergroup         12 2014-02-18 10:11 /user/hadoop/in/test1.txt
  -rw-r--r--   2 hadoop supergroup         13 2014-02-18 10:11 /user/hadoop/in/test2.txt
  /user/hadoop/in表示用户hadoop的根目录下in目录,而非操作系统的目录。
  -ls参数:分布式文件系统中列出目录文件
  $ pwd
  /home/hadoop/hadoop-0.20.2
  $ bin/hadoop jar ../hadoop-0.20.2-examples.jar wordcount in out
  14/02/18 11:15:58 INFO input.FileInputFormat: Total input paths to process : 2
  14/02/18 11:15:59 INFO mapred.JobClient: Running job: job_201402180923_0001
  14/02/18 11:16:00 INFO mapred.JobClient:map 0% reduce 0%
  14/02/18 11:16:07 INFO mapred.JobClient:map 50% reduce 0%
  14/02/18 11:16:10 INFO mapred.JobClient:map 100% reduce 0%
  14/02/18 11:16:19 INFO mapred.JobClient:map 100% reduce 100%
  14/02/18 11:16:21 INFO mapred.JobClient: Job complete: job_201402180923_0001
  14/02/18 11:16:21 INFO mapred.JobClient: Counters: 17
  14/02/18 11:16:21 INFO mapred.JobClient:   Job Counters
  14/02/18 11:16:21 INFO mapred.JobClient:   Launched reduce tasks=1
  14/02/18 11:16:21 INFO mapred.JobClient:   Launched map tasks=2
  14/02/18 11:16:21 INFO mapred.JobClient:   Data-local map tasks=2
  14/02/18 11:16:21 INFO mapred.JobClient:   FileSystemCounters
  14/02/18 11:16:21 INFO mapred.JobClient:   FILE_BYTES_READ=55
  14/02/18 11:16:21 INFO mapred.JobClient:   HDFS_BYTES_READ=25
  14/02/18 11:16:21 INFO mapred.JobClient:   FILE_BYTES_WRITTEN=180
  14/02/18 11:16:21 INFO mapred.JobClient:   HDFS_BYTES_WRITTEN=25
  14/02/18 11:16:21 INFO mapred.JobClient:   Map-Reduce Framework
  14/02/18 11:16:21 INFO mapred.JobClient:   Reduce input groups=3
  14/02/18 11:16:21 INFO mapred.JobClient:   Combine output records=4
  14/02/18 11:16:21 INFO mapred.JobClient:   Map input records=2
  14/02/18 11:16:21 INFO mapred.JobClient:   Reduce shuffle bytes=61
  14/02/18 11:16:21 INFO mapred.JobClient:   Reduce output records=3
  14/02/18 11:16:21 INFO mapred.JobClient:   Spilled Records=8
  14/02/18 11:16:21 INFO mapred.JobClient:   Map output bytes=41
  14/02/18 11:16:21 INFO mapred.JobClient:   Combine input records=4
  14/02/18 11:16:21 INFO mapred.JobClient:   Map output records=4
  14/02/18 11:16:21 INFO mapred.JobClient:   Reduce input records=4
  14/02/18 11:16:21 INFO mapred.JobClient:   Reduce input records=4
  jar参数:提交作业
  wordount参数:功能为wordcount
  in参数:原始数据位置
  out参数:输出数据位置
  $ bin/hadoop dfs -ls
  Found 4 items
  drwxr-xr-x   - hadoop supergroup          0 2014-02-18 09:52 /user/hadoop/in
  drwxr-xr-x   - hadoop supergroup          0 2014-02-18 11:16 /user/hadoop/out
  $ bin/hadoop dfs -ls ./out
  Found 2 items
  drwxr-xr-x   - hadoop supergroup          0 2014-02-18 11:15 /user/hadoop/out/_logs
  -rw-r--r--   2 hadoop supergroup         25 2014-02-18 11:16 /user/hadoop/out/part-r-00000
  $ bin/hadoop dfs -cat ./out/part-r-00000
  hadoop1
  hello   2
  world   1
  八、通过WEB了解hadoop的活动
  $ netstat -all |grep :5
  getnameinfo failed
  tcp      0      0 *:57195                     *:*                         LISTEN
  tcp      0      0 hadoop1:ssh               10.10.1.198:58000         ESTABLISHED
  tcp      0      0 *:50030                     *:*                         LISTEN
  tcp      0      0 *:50070                     *:*                         LISTEN
  tcp      0      0 *:59420                     *:*                         LISTEN
  tcp      0      0 *:50849                     *:*                         LISTEN
  tcp      0      0 *:50090                     *:*                         LISTEN
  ---监控jobtracker
  http://10.10.1.131:50030/jobtracker.jsp
  hadoop1 Hadoop Map/Reduce Administration
  Quick Links
  Scheduling Info
  Running Jobs
  Completed Jobs
  Failed Jobs
  Local Logs
  State: INITIALIZING
  Started: Mon Feb 17 21:33:23 CST 2014
  Version: 0.20.2, r911707
  Compiled: Fri Feb 19 08:07:34 UTC 2010 by chrisdo
  Identifier: 201402172133
  --------------------------------------------------------------------------------
  Cluster Summary (Heap Size is 117.94 MB/888.94 MB)
  Maps Reduces Total Submissions Nodes Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes
  0 0 0 0 0 0 - 0
  --------------------------------------------------------------------------------
  Scheduling Information
  Queue NameScheduling Information
  default N/A
  --------------------------------------------------------------------------------
  Filter (Jobid, Priority, User, Name)
  Example: 'user:smith 3200' will filter by 'smith' only in the user field and '3200' in all fields
  --------------------------------------------------------------------------------
  Running Jobs
  none
  --------------------------------------------------------------------------------
  Completed Jobs
  none
  --------------------------------------------------------------------------------
  Failed Jobs
  none
  --------------------------------------------------------------------------------
  Local Logs
  Log directory, Job Tracker History
  --------------------------------------------------------------------------------
  Hadoop, 2014.
  ---监控HDFS
  http://10.10.1.131:50070/dfshealth.jsp
  NameNode 'hadoop1:9000'
  Started:Mon Feb 17 12:35:22 CST 2014
  Version:0.20.2, r911707
  Compiled:Fri Feb 19 08:07:34 UTC 2010 by chrisdo
  Upgrades:There are no upgrades in progress.
  Browse the filesystem
  Namenode Logs
  --------------------------------------------------------------------------------
  Cluster Summary
  1 files and directories, 0 blocks = 1 total. Heap Size is 117.94 MB / 888.94 MB (13%)
  Configured Capacity : 0 KB
  DFS Used : 0 KB
  Non DFS Used : 0 KB
  DFS Remaining : 0 KB
  DFS Used% : 100 %
  DFS Remaining% : 0 %
  Live Nodes: 0
  Dead Nodes: 0
  There are no datanodes in the cluster
  --------------------------------------------------------------------------------
  NameNode Storage:
  Storage Directory Type State
  /tmp/hadoop-hadoop/dfs/name IMAGE_AND_EDITS Active
  --------------------------------------------------------------------------------
  Hadoop, 2014.

页: [1]
查看完整版本: CentOS6.4+hadoop-0.20.2安装实录