q6542125 发表于 2018-10-31 07:57:40

hadoop1.1.2集群详细安装实例

  最近因为公司准备将数据迁移到Hbase上,所以不得不学习hadoop,我们尝试过将数据迁移到mongodb上面,周末在家里做了一个小试验,使用java+mongodb做爬虫抓取数据,我将mongodb安装在centos6.3虚拟机上,分配1G的内存,开始抓数据,半小时后,虚拟机内存吃光,没有办法解决内存问题,网上很多人说没有32G的内存不要玩mongodb,这样的说法很搞笑,难道我将1T的数据都放在内存上,不是坑么,所以说这是我们为什么选择Hbase的原因之一。
软件环境
  使用虚拟机安装,centos 6.3,jdk-7u45-linux-i586.rpm,hadoop-1.1.2.tar.gz,hbase-0.96.1.1-hadoop1-bin.tar.gz ,zookeeper-3.4.5-1374045102000.tar.gz
第一步,配置网络
  1、配置IP
  使用ifconfig 查看IP,可以使用vi /etc/sysconfig/network-scripts/ifcfg-eth0 ,ifcfg-eth0为网卡,根据自己的需求配置,如下:
  DEVICE=eth0
  BOOTPROTO=static
  ONBOOT=yes
  IPADDR=192.168.1.110
  NETMASK=255.255.255.0
  TYPE=Ethernet
  GATEWAY=192.168.1.1
  里面MAC地址不需要修改。
  2、配置主机名network与DNS
  使用vi /etc/sysconfig/network,修改主机名与网关,这里的网关可填可不填,如下
  NETWORKING=yes
  NETWORKING_IPV6=no
  HOSTNAME=master
  GATEWAY=192.168.1.1
  使用vi /etc/resolv.conf ,添加DNS,这里可以添加你省内常用的DNS,什么8.8.8.8就算了,太慢了,添加DNS为了使用yum安装程序,如下
  nameserver 202.106.0.20
  nameserver 192.168.1.1
  配置vi /etc/hosts 文件,如下
  192.168.1.110master
  192.168.1.111node1
  192.168.1.112node2

3、配置yum源与安装基础软件包
  配置yum源,为了更方便的安装程序包,我使用的是163的源,国内比较快,当然你也可以不配置yum源,下载CentOS-Base.repo,地址:http://mirrors.163.com/.help/centos.html,上传yum文件夹中的文件到/etc/yum.repos.d中,覆盖文件,然后yum makecache更新源,使用yum源安装:
  # yum -y install lrzsz gcc gcc-c++ libstdc++-devel ntp安装配置文件
  4、同步时间与地区
  cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

  将地区配置为上海,再使用ntpdate更新时间,再使用crontab -e,添加如下
  # 30 23 * * * /usr/sbin/ntpdate cn.pool.ntp.org ; hwclock -w
  每天晚上23:30更新一下时间
  5、关闭防火墙,SELINUX=disable
  # service iptables stop
  # vi /etc/selinux/config
  SELINUX=disabled
第二步,安装java,配置ssh,添加用户
  1、安装java1.7
  将所有的软件包都放到/opt文件夹中
  # chmod -R 777 /opt
  # rpm -ivh jdk-7u45-linux-i586.rpm
  # vi /etc/profile.d/java_hadoop.sh
  export JAVA_HOME=/usr/java/jdk1.7.0_45/
  export PATH=$PATH:$JAVA_HOME/bin
  # source /etc/profile
  # echo $JAVA_HOME
  /usr/java/jdk1.7.0_45/


  2、添加hadoop用户,配置无ssh登录
  # groupadd hadoop
  # useradd hadoop -g hadoop
  # su hadoop
  $ ssh-keygen -t dsa -P '' -f /home/hadoop/.ssh/id_dsa

  $ cp>  $ chmod go-wx authorized_keys
  $ ssh master
  Last login: Sun Mar 23 23:16:00 2014 from 192.168.1.110
  做到了以上工作,就可以开始安装hadoop了。
  为要选择使用hadoop1,可以去看看51cto向磊的博客http://slaytanic.blog.51cto.com/2057708/1397396,就是因为产品比较成熟,稳定。
开始安装:
  # ls
  hadoop-1.1.2.tar.gz
  hbase-0.96.1.1-hadoop1-bin.tar.gz
  zookeeper-3.4.5-1374045102000.tar.gz
  #将需要的文件与安装包上传,hbase要与hadoop版本一致,在hbase/lib/hadoop-core-1.1.2.jar,为正确。
  # mkdir -p /opt/modules/hadoop/
  # chown -R hadoop:hadoop /opt/modules/hadoop/*
  # ll
  total 148232
  -rwxrwxrwx 1 hadoop hadoop 61927560 Oct 29 11:16 hadoop-1.1.2.tar.gz
  -rwxrwxrwx 1 hadoop hadoop 73285670 Mar 24 12:57 hbase-0.96.1.1-hadoop1-bin.tar.gz
  -rwxrwxrwx 1 hadoop hadoop 16402010 Mar 24 12:57 zookeeper-3.4.5-1374045102000.tar.gz
  #新建/opt/modules/hadoop/文件夹,把需要的软件包都复制到文件夹下,将文件夹的权限使用者配置为hadoop用户组和hadoop用户,其实这一步可以最后来做。
  # tar -zxvf hadoop-1.1.2.tar.gz
  #解压hadoop-1.1.2.tar.gz文件
  # cat /etc/profile.d/java_hadoop.sh
  export JAVA_HOME=/usr/java/jdk1.7.0_45/
  export HADOOP_HOME=/opt/modules/hadoop/hadoop-1.1.2/
  export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
  #使用vi,新建一下HADOOP_HOME的变量,名称不能自定义(不然会很麻烦)。
  # source /etc/profile
  #更新环境变量。
  # echo $HADOOP_HOME
  /opt/modules/hadoop/hadoop-1.1.2
  #打印一下$HADOOP_HOME,就输出了我们的地址。
  $ vi /opt/modules/hadoop/hadoop-1.1.2/conf/hadoop-env.sh
  export HADOOP_HEAPSIZE=64
  #这里是内存大少的配置,我的虚拟机配置为64m。
  # mkdir -p /data/
  #新建一下data文件夹,准备将所有的数据,存放在这个文件夹下。
  # mkdir -p /data/hadoop/hdfs/data
  # mkdir -p /data/hadoop/hdfs/name
  # mkdir -p /data/hadoop/hdfs/namesecondary/
  $ mkdir -p /data/hadoop/mapred/mrlocal
  $ mkdir -p /data/hadoop/mapred/mrsystem
  #su
  #切换到root用户,不然是不可以修改权限的。
  # chown -R hadoop:hadoop /data/
  #将data文件夹的权限使用者配置为hadoop用户组和hadoop用户
  # su hadoop
  #切换到hadoop用户
  $ chmod go-w /data/hadoop/hdfs/data/
  #这里步非常重要,就是去除其他用户写入hdfs数据,可以配置为755
  $ ll /data/hadoop/hdfs/
  drwxr-xr-x 2 hadoop hadoop 4096 Mar 24 13:21 data
  drwxrwxr-x 2 hadoop hadoop 4096 Mar 24 13:21 name
  drwxrwxr-x 2 hadoop hadoop 4096 Mar 24 13:20 namesecondary
接下来就配置xml文件:
  $ cd /opt/modules/hadoop/hadoop-1.1.2/conf
  $ vi core-site.xml
  
  
  
  
  
  fs.default.name
  hdfs://master:9000
  
  
  fs.checkpoint.dir
  /data/hadoop/hdfs/namesecondary
  
  
  fs.checkpoint.period
  1800
  
  
  fs.checkpoint.size
  33554432
  
  
  io.compression.codecs
  org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache
  .hadoop.io.compress.BZip2Codec
  
  
  fs.trash.interval
  1440
  
  
  $ vi hdfs-site.xml
  
  
  
  
  
  dfs.name.dir
  /data/hadoop/hdfs/name
  
  
  dfs.data.dir
  /data/hadoop/hdfs/data
  
  
  
  dfs.http.address
  master:50070
  
  
  dfs.secondary.http.address
  node1:50090
  
  
  dfs.replication
  3
  
  
  dfs.datanode.du.reserved
  873741824
  
  
  dfs.block.size
  134217728
  
  
  dfs.permissions
  false
  
  
  $ vi mapred-site.xml
  
  
  
  
  
  mapred.job.tracker
  master:9001
  
  
  mapred.local.dir
  /data/hadoop/mapred/mrlocal
  true
  
  
  mapred.system.dir
  /data/hadoop/mapred/mrsystem
  true
  
  
  mapred.tasktracker.map.tasks.maximum
  2
  true
  
  
  mapred.tasktracker.reduce.tasks.maximum
  1
  true
  
  
  io.sort.mb
  32
  true
  
  
  mapred.child.java.opts
  -Xmx128M
  
  
  mapred.compress.map.output
  true
  
  
  #配置文件完成
  #切换到master主机,开启
  # vi/opt/modules/hadoop/hadoop-1.1.2/conf/masters
  node1
  node2
  # vi/opt/modules/hadoop/hadoop-1.1.2/conf/slaves
  master
  node1
  node2
  #节点配置,这里很重要,masters文件不需要把masters加入
hadoop1.1.2的node1,node2配置开始
  前面的网络的配置我就不说了:
  #登录master,将authorized_keys,发过去
  # scp /home/hadoop/.ssh/authorized_keys root@node1:/home/hadoop/.ssh/
  # scp /home/hadoop/.ssh/authorized_keys root@node1:/home/hadoop/.ssh/
  The authenticity of host 'node1 (192.168.1.111)' can't be established.
  RSA key fingerprint is 0d:aa:04:89:28:44:b9:e8:bb:5e:06:d0:dc:de:22:85.
  Are you sure you want to continue connecting (yes/no)? yes
  Warning: Permanently added 'node1,192.168.1.111' (RSA) to the list of known hosts.
  root@node1's password:
  #切换到node1主机
  # su hadoop
  $ ssh master
  Last login: Sun Mar 23 23:17:06 2014 from 192.168.1.110
  # vi/opt/modules/hadoop/hadoop-1.1.2/conf/masters
  node1
  node2
  # vi/opt/modules/hadoop/hadoop-1.1.2/conf/slaves
  master
  node1
  node2
  #切换到master主机,开启
  $ hadoop namenode -format
  Warning: $HADOOP_HOME is deprecated.
  14/03/24 13:33:52 INFO namenode.NameNode: STARTUP_MSG:
  /************************************************************
  STARTUP_MSG: Starting NameNode
  STARTUP_MSG:   host = master/192.168.1.110
  STARTUP_MSG:   args = [-format]
  STARTUP_MSG:   version = 1.1.2
  STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782; compiled by 'hortonfo' on Thu Jan 31 02:03:24 UTC 2013
  ************************************************************/
  Re-format filesystem in /data/hadoop/hdfs/name ? (Y or N) Y
  14/03/24 13:33:54 INFO util.GSet: VM type       = 32-bit
  14/03/24 13:33:54 INFO util.GSet: 2% max memory = 0.61875 MB
  14/03/24 13:33:54 INFO util.GSet: capacity      = 2^17 = 131072 entries
  14/03/24 13:33:54 INFO util.GSet: recommended=131072, actual=131072
  14/03/24 13:33:55 INFO namenode.FSNamesystem: fsOwner=hadoop
  14/03/24 13:33:55 INFO namenode.FSNamesystem: supergroup=supergroup
  14/03/24 13:33:55 INFO namenode.FSNamesystem: isPermissionEnabled=false
  14/03/24 13:33:55 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
  14/03/24 13:33:55 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
  14/03/24 13:33:55 INFO namenode.NameNode: Caching file names occuring more than 10 times

  14/03/24 13:33:55 INFO common.Storage: Image file of>  14/03/24 13:33:56 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/data/hadoop/hdfs/name/current/edits
  14/03/24 13:33:56 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/data/hadoop/hdfs/name/current/edits
  14/03/24 13:33:56 INFO common.Storage: Storage directory /data/hadoop/hdfs/name has been successfully formatted.
  14/03/24 13:33:56 INFO namenode.NameNode: SHUTDOWN_MSG:
  /************************************************************
  SHUTDOWN_MSG: Shutting down NameNode at master/192.168.1.110
  ************************************************************/
  $ start-all.sh
  $ jps
  7603 TaskTracker
  7241 DataNode
  7119 NameNode
  7647 Jps
  7473 JobTracker
zookeeper-3.4.5配置开始
  #在master机器上安装,为namenode
  # tar -zxvf zookeeper-3.4.5-1374045102000.tar.gz
  # chown -R hadoop:hadoop zookeeper-3.4.5
  # vi /opt/modules/hadoop/zookeeper-3.4.5/conf/zoo.cfg
  # The number of milliseconds of each tick
  tickTime=2000
  # The number of ticks that the initial
  # synchronization phase can take
  initLimit=10
  # The number of ticks that can pass between
  # sending a request and getting an acknowledgement
  syncLimit=5
  # the directory where the snapshot is stored.
  # do not use /tmp for storage, /tmp here is just
  # example sakes.
  dataDir=/data/zookeeper
  # the port at which the clients will connect
  clientPort=2181
  server.1=192.168.1.110:2888:3888
  server.2=192.168.1.111:2888:3888
  server.3=192.168.1.112:2888:3888
  #新建文件myid(在zoo.cfg 配置的dataDir目录下,此处为/home/hadoop/zookeeper),使得myid中的值与server的编号相同,比如namenode上的myid: 1。datanode1上的myid:2。以此类推。
  # Be sure to read the maintenance section of the
  # administrator guide before turning on autopurge.
  #
  # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
  #
  # The number of snapshots to retain in dataDir
  #autopurge.snapRetainCount=3
  # Purge task interval in hours
  # Set to "0" to disable auto purge feature
  #autopurge.purgeInterval=1
  #开始配置。
  # mkdir -p /data/zookeeper/
  # chown -R hadoop:hadoop /data/zookeeper/
  # echo "1" > /data/zookeeper/myid
  # cat /data/zookeeper/myid
  1
  # chown -R hadoop:hadoop /data/zookeeper/*
  # scp -r /opt/modules/hadoop/zookeeper-3.4.5/ root@node1:/opt/modules/hadoop/
  #将/opt/modules/hadoop/zookeeper-3.4.5发送到node1节点,新增一个myid为2
  #切换到node1
  # echo "2" > /data/zookeeper/myid
  # cat /data/zookeeper/myid
  2
  # chown -R hadoop:hadoop /opt/modules/hadoop/zookeeper-3.4.5
  # chown -R hadoop:hadoop /data/zookeeper/*
  #切换到master
  # su hadoop
  $ cd zookeeper-3.4.5
  $ ./zkServer.sh start
  JMX enabled by default
  Using config: /opt/modules/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
  Starting zookeeper ... STARTED
  $ jps
  5507 NameNode
  5766 JobTracker
  6392 Jps
  6373 QuorumPeerMain
  5890 TaskTracker
  5626 DataNode
  # su hadoop
  $ cd bin/
  $ ./zkServer.sh start
  JMX enabled by default
  Using config: /opt/modules/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
  Starting zookeeper ... STARTED
  $ jps
  5023 SecondaryNameNode
  5120 TaskTracker
  5445 Jps
  4927 DataNode
  5415 QuorumPeerMain
  #两边开启之后,就测试一下Mode: follower代表正常
  $ ./zkServer.sh status
  JMX enabled by default
  Using config: /opt/modules/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
  Mode: follower
  -----------------------------------zookeeper-3.4.5配置结束-----------------------------------
  # su hadoop
  $ cd /opt/modules/hadoop/zookeeper-3.4.5/bin/
  $ ./zkServer.sh start
hbase配置开始,三台机器都需要的
  # tar -zxvf hbase-0.96.1.1-hadoop1-bin.tar.gz
  #解压文件
  # vi /etc/profile.d/java_hadoop.sh
  export JAVA_HOME=/usr/java/jdk1.7.0_45/
  export HADOOP_HOME=/opt/modules/hadoop/hadoop-1.1.2/
  export HBASE_HOME=/opt/modules/hadoop/hbase-0.96.1.1/
  export HBASE_CLASSPATH=/opt/modules/hadoop/hadoop-1.1.2/conf/
  export HBASE_MANAGES_ZK=true
  export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin
  #配置环境变量。
  # source /etc/profile
  # echo $HBASE_CLASSPATH
  /opt/modules/hadoop/hadoop-1.1.2/conf/
  # vi /opt/modules/hadoop/hbase-0.96.1.1/conf/hbase-site.xml
  
  
  hbase.rootdir
  hdfs://master:9000/hbase
  
  
  hbase.cluster.distributed
  true
  
  
  hbase.zookeeper.quorum
  master,node1,node2
  
  
  hbase.zookeeper.property.dataDir
  /data/zookeeper
  
  
  # cat /opt/modules/hadoop/hbase-0.96.1.1/conf/regionservers
  master
  node1
  node2
  # chown -R hadoop:hadoop /opt/modules/hadoop/hbase-0.96.1.1
  # su hadoop
  $ ll
  total 148244
  drwxr-xr-x 16 hadoop hadoop   4096 Mar 24 13:36 hadoop-1.1.2
  -rwxrwxrwx1 hadoop hadoop 61927560 Oct 29 11:16 hadoop-1.1.2.tar.gz
  drwxr-xr-x7 hadoop hadoop   4096 Mar 24 22:40 hbase-0.96.1.1
  -rwxrwxrwx1 hadoop hadoop 73285670 Mar 24 12:57 hbase-0.96.1.1-hadoop1-bin.tar.gz
  drwxr-xr-x 10 hadoop hadoop   4096 Nov52012 zookeeper-3.4.5
  -rwxrwxrwx1 hadoop hadoop 16402010 Mar 24 12:57 zookeeper-3.4.5-1374045102000.tar.gz
  $ scp -r hbase-0.96.1.1 node1:/opt/modules/hadoop
  $ scp -r hbase-0.96.1.1 node2:/opt/modules/hadoop
  # chown -R hadoop:hadoop /opt/modules/hadoop/hbase-0.96.1.1
  # chown -R hadoop:hadoop /opt/modules/hadoop/hbase-0.96.1.1
  # su hadoop
  $ hbase shell
  #进入hbase
  # jps
  17616 QuorumPeerMain
  20282 HRegionServer
  20101 HMaster
  9858 JobTracker
  9712 DataNode
  9591 NameNode
  29655 Jps
  9982 TaskTracker



  第一次写这么多,我上传了一些文件,测试,详细的命令我就不写了,可能无法安装成功,权限是很重要的问题,准备录制一个视频,写成shell,给同事或网友学习。
  所有的配置文件及安装包下载地址:http://pan.baidu.com/share/link?shareid=2478581294&uk=3607515896

页: [1]
查看完整版本: hadoop1.1.2集群详细安装实例