设为首页 收藏本站
查看: 1037|回复: 0

[经验分享] hadoop研究

[复制链接]

尚未签到

发表于 2018-10-29 13:30:48 | 显示全部楼层 |阅读模式
  包下载
  http://archive.cloudera.com/cdh4/cdh/4/
  http://apache.fayea.com/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz
  http://mirrors.hust.edu.cn/apache/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz
  http://apache.opencas.org/hbase/1.2.0/hbase-1.2.0-bin.tar.gz
  http://download.oracle.com/otn-pub/java/jdk/8u73-b02/jdk-8u73-linux-x64.tar.gz
  环境
  10.200.140.58hadoop-308.99bill.com #物理机   datanode  zookeeper   regionserver
  10.200.140.59hadoop-309.99bill.com #物理机   datanode  zookeeper   regionserver
  10.200.140.60hadoop-310.99bill.com #物理机 datanode zookeeper regionserver
  10.200.140.45hadoop-311.99bill.com#虚拟机   master
  10.200.140.46hadoop-312.99bill.com#虚拟机   second  hmaster
  修改主机名,禁用ipv6
  cat /etc/profile
  export JAVA_HOME=/opt/jdk1.7.0_80/
  PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
  CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
  export JAVA_HOME
  export PATH

  export>  HADOOP_BASE=/opt/oracle/hadoop
  HADOOP_HOME=/opt/oracle/hadoop
  YARN_HOME=/opt/oracle/hadoop
  PATH=$HADOOP_BASE/bin:$PATH
  export HADOOP_BASE PATH
  10.200.140.45 能够免密登陆
  [oracle@hadoop-311 hadoop]$ cat core-site.xml
  
  
  fs.defaultFS
  hdfs://hadoop-311.99bill.com:9000
  
  
  io.file.buffer.size
  16384
  
  
  [oracle@hadoop-311 hadoop]$ cat hdfs-site.xml
  
  
  
  
  dfs.replication
  3
  
  
  dfs.namenode.name.dir
  /opt/hadoop/name
  
  
  dfs.datanode.data.dir
  /opt/hadoop/data/dfs
  
  
  dfs.datanode.handler.count
  150
  
  
  dfs.blocksize
  64m
  
  
  dfs.datanode.du.reserved
  1073741824
  true
  
  
  dfs.hosts.exclude
  /opt/oracle/hadoop/etc/hadoop/slave-deny-list
  
  
  dfs.namenode.http-address
  hadoop-311.99bill.com:50070
  
  
  dfs.namenode.secondary.http-address
  hadoop-312.99bill.com:50090
  
  
  dfs.permissions
  false
  
  
  [oracle@hadoop-311 hadoop]$ cat mapred-site.xml
  
  
  
  
  
  mapreduce.framework.name
  yarn
  
  
  mapreduce.map.memory.mb
  4000
  
  
  mapreduce.reduce.memory.mb
  4000
  
  
  定义 datanode
  [oracle@hadoop-311 hadoop]$ cat slaves
  hadoop-308.99bill.com
  hadoop-309.99bill.com
  hadoop-310.99bill.com
  hadoop-env.sh
  export HADOOP_LOG_DIR=$HADOOP_HOME/logs
  export HADOOP_PID_DIR=/opt/oracle/hadoop
  export HADOOP_SECURE_DN_PID_DIR=/opt/oracle/hadoop
  export JAVA_HOME=/opt/jdk1.7.0_80/
  export HADOOP_HEAPSIZE=6000
  exec_time=`date +'%Y%m%d-%H%M%S'`
  export HADOOP_NAMENODE_OPTS="-Xmx6g ${HADOOP_NAMENODE_OPTS}"
  export HADOOP_SECONDARYNAMENODE_OPTS="-Xmx6g ${HADOOP_SECONDARYNAMENODE_OPTS}"
  export HADOOP_DATANODE_OPTS="-server -Xmx6000m -Xms6000m -Xmn1000m -XX:PermSize=128M -XX:MaxPermSize=128M -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$HADOOP_LOG_DIR/gc-$(hostname)-datanode-${exec_time}.log -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=10 -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=20"
  [oracle@hadoop-311 hadoop]$ cat yarn-site.xml
  
  
  
  
  yarn.resourcemanager.address
  hadoop-311.99bill.com:8032
  
  
  yarn.resourcemanager.scheduler.address
  hadoop-311.99bill.com:8030
  
  
  yarn.resourcemanager.resource-tracker.address
  hadoop-311.99bill.com:8031
  
  
  yarn.resourcemanager.admin.address
  hadoop-311.99bill.com:8033
  
  
  yarn.resourcemanager.webapp.address
  hadoop-311.99bill.com:8088
  
  
  yarn.nodemanager.aux-services
  mapreduce.shuffle
  
  
  启动hadoop集群
  第一次执行,需要格式化namenode,以后启动不需要执行此步骤。
  hadoop/bin/hadoop -format
  然后启动hadoop
  hadoop/sbin/start-all.sh
  启动完成后,如果没有什么错误,执行jps查询一下当前进程,NameNode是Hadoop Master进程,SecondaryNameNode,ResourceManager是Hadoop进程。
  [oracle@hadoop-311 hadoop]$ jps
  13332 Jps
  5430 NameNode
  5719 ResourceManager
  三、ZooKeeper集群安装
  1.    解压缩zookeeper-3.4.8.tar.gz并重命名zookeeper, 进入zookeeper/conf目录,cp zoo_sample.cfg zoo.cfg 并编辑
  [oracle@hadoop-308 conf]$ cat zoo.cfg
  # The number of milliseconds of each tick
  tickTime=2000
  maxClientCnxns=0
  # The number of ticks that the initial
  # synchronization phase can take
  initLimit=50
  # The number of ticks that can pass between
  # sending a request and getting an acknowledgement
  syncLimit=5
  # the directory where the snapshot is stored.
  # 保留快照数
  autopurge.snapRetainCount=2
  # Purge task interval in hours
  # 清理快照时间间隔(小时)
  autopurge.purgeInterval=84
  dataDir=/opt/hadoop/zookeeperdata
  # the port at which the clients will connect
  clientPort=2181
  server.1=hadoop-308:2888:3888
  server.2=hadoop-309:2888:3888
  server.3=hadoop-310:2888:3888
  2.    新建并编辑myid文件
  1
  mkdir /opt/hadoop/zookeeperdata
  echo "1" > /opt/hadoop/zookeeperdata/myid
  3.    然后同步zookeeper到其他两个节点,然后在其他节点需要修改myid为相应的数字。
  启动 zookeeper
  cd /opt/oracle/zookeeper
  ./bin/zkServer.sh start
  [oracle@hadoop-308 tools]$ jps
  11939 Jps
  4373 DataNode
  8579 HRegionServer
  四、HBase集群的安装和配置
  1.    解压缩hbase-1.2.0-bin.tar.gz并重命名为hbase, 编辑/hbase/conf/hbase-env.sh
  export HBASE_MANAGES_ZK=false
  export HBASE_HEAPSIZE=4000
  export JAVA_HOME=/opt/jdk1.7.0_80/
  [oracle@hadoop-311 conf]$ cat hbase-site.xml
  
  
  
  
  
  hbase.rootdir
  hdfs://hadoop-311:9000/hbase
  The directory shared by region servers.
  
  
  hbase.cluster.distributed
  true
  
  
  hbase.master.port
  60000
  
  
  hbase.master
  hadoop-312
  
  
  hbase.zookeeper.quorum
  hadoop-308,hadoop-309,hadoop-310
  
  
  hbase.regionserver.handler.count
  300
  
  
  hbase.hstore.blockingStoreFiles
  70
  
  
  zookeeper.session.timeout
  60000
  
  
  hbase.regionserver.restart.on.zk.expire
  true
  
  Zookeeper session expired will force regionserver exit.
  Enable this will make the regionserver restart.
  
  
  
  hbase.replication
  false
  
  
  hfile.block.cache.size
  0.4
  
  
  hbase.regionserver.global.memstore.upperLimit
  0.35
  
  
  hbase.hregion.memstore.block.multiplier
  8
  
  
  hbase.server.thread.wakefrequency
  100
  
  
  hbase.master.distributed.log.splitting
  false
  
  
  hbase.regionserver.hlog.splitlog.writer.threads
  3
  
  
  hbase.client.scanner.caching
  10
  
  
  hbase.hregion.memstore.flush.size
  134217728
  
  
  hbase.hregion.memstore.mslab.enabled
  true
  
  
  hbase.coprocessor.user.region.classes
  org.apache.hadoop.hbase.coprocessor.AggregateImplementation
  
  
  dfs.datanode.max.xcievers
  2096
  PRIVATE CONFIG VARIABLE
  
  
  分发hbase到其他4个节点
  五、启动集群
  1.     启动zookeeper
  1
  zookeeper/bin/zkServer.sh start
  2.    启动Hadoop
  $ hadoop/sbin/start-all.sh
  修改hbase/conf/hbase-site.xml
  [oracle@hadoop-311 conf]$ cat hbase-site.xml
  
  
  
  
  
  hbase.rootdir
  hdfs://hadoop-311:9000/hbase
  The directory shared by region servers.
  
  
  hbase.cluster.distributed
  true
  
  
  hbase.master.port
  60000
  
  
  hbase.master
  hadoop-312
  
  
  hbase.zookeeper.quorum
  hadoop-308,hadoop-309,hadoop-310
  
  
  hbase.regionserver.handler.count
  300
  
  
  hbase.hstore.blockingStoreFiles
  70
  
  
  zookeeper.session.timeout
  60000
  
  
  hbase.regionserver.restart.on.zk.expire
  true
  
  Zookeeper session expired will force regionserver exit.
  Enable this will make the regionserver restart.
  
  
  
  hbase.replication
  false
  
  
  hfile.block.cache.size
  0.4
  
  
  hbase.regionserver.global.memstore.upperLimit
  0.35
  
  
  hbase.hregion.memstore.block.multiplier
  8
  
  
  hbase.server.thread.wakefrequency
  100
  
  
  hbase.master.distributed.log.splitting
  false
  
  
  hbase.regionserver.hlog.splitlog.writer.threads
  3
  
  
  hbase.client.scanner.caching
  10
  
  
  hbase.hregion.memstore.flush.size
  134217728
  
  
  hbase.hregion.memstore.mslab.enabled
  true
  
  
  hbase.coprocessor.user.region.classes
  org.apache.hadoop.hbase.coprocessor.AggregateImplementation
  
  
  dfs.datanode.max.xcievers
  2096
  PRIVATE CONFIG VARIABLE
  
  
  hbase-env.sh
  export JAVA_HOME=/opt/jdk1.7.0_80/
  export HBASE_CLASSPATH=/opt/oracle/hadoop/conf
  export HBASE_HEAPSIZE=4000
  export HBASE_OPTS="-XX:PermSize=512M -XX:MaxPermSize=512M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=10 -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=20"
  exec_time=`date +'%Y%m%d-%H%M%S'`
  export HBASE_MASTER_OPTS="-Xmx4096m -Xms4096m -Xmn128m  -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-master-${exec_time}.log"
  export HBASE_REGIONSERVER_OPTS="-Xmx8192m -Xms8192m -Xmn512m  -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-regionserver-${exec_time}.log"
  export HBASE_MANAGES_ZK=fals
  [oracle@hadoop-311 conf]$ cat regionservers
  hadoop-308
  hadoop-309
  hadoop-310
  分发到其他四台
  cd /opt/oracle/hbase
  sh bin/start-hbase.sh
  [oracle@hadoop-311 bin]$ ./hbase shell
  16/03/23 20:20:47 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
  HBase Shell; enter 'help' for list of supported commands.
  Type "exit" to leave the HBase Shell
  Version 0.94.15-cdh4.7.1, r, Tue Nov 18 08:42:59 PST 2014
  hbase(main):001:0> status

  SLF4J:>  SLF4J: Found binding in [jar:file:/opt/oracle/hbase/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  SLF4J: Found binding in [jar:file:/opt/oracle/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

  16/03/23 20:20:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java>  3 servers, 0 dead, 0.6667 average load
  10.常见问题
  10.1.Namenode非正常关闭
  在所有的hadoop环境机器上用jps命令,把所有的进程列出,然后kill掉,再按照启动顺序启动
  10.2.Datanode非正常关闭
  在namenode上启动HDFS
  运行hadoop/bin/start-all.sh
  如果Datanode同时是zookeeper,还需要启动zookeeper
  在该datanode上运行zookeeper/bin/zkServer.sh start。
  在namenode上启动Hbase
  运行hbase/bin/start-hbase.sh
  http://10.200.140.46:60010/master-status
  10.3.停止一台非master的服务器
  在该台服务器上运行:
  hadoop/bin/hadoop-daemon.sh stop datanode
  hadoop/bin/hadoop-daemon.sh stop tasktracker
  hbase/bin/hbase-daemon.sh stop regionserver
  在http://10.200.140.45:50070/dfshealth.jsp查看该节点是否已经变成dead nodes,变成dead nodes之后,就可以停止该台服务器
  在刚停止服务的时候,看到的截图如下:
  当停止服务成功,看到的截图如下:
  重启服务器以后,在hadoop001上运行,启动服务:
  hadoop/bin/start-all.sh
  hbase/bin/start-hbase.sh
  11.监控端口
  11.1.Namenode监控端口(hadoop001):
  60010,60000,50070,50030,9000,9001,10000
  11.2.zookeeper监控端口(hadoop003,hadoop004,hadoop005)
  2181
  11.3.Datanode监控端口(hadoop003,hadoop004,hadoop005,hadoop006,hadoop007)
  60030,50075
  12、HDFS 上传文件不均衡和Balancer太慢的问题
  Hmaster 有个start-balancer.sh
  ###########迁移方案
  先在新机房准备一套新的hadoop环境
  ###hadoop迁移-hbase
  1 确定新hbase可以正常运行,并且两个集群之间的机器都可以用机器名互相访问到   ok
  2 停掉新hbase   ok
  3 在两个集群任何hadoop机器运行下面的命令
  ./hadoop distcp -bandwidth 10 -m 3 hdfs://hadoop001.99bill.com:9000/hbase/if_fss_files hdfs://hadoop-312.99bill.com:9000/hbase/if_fss_files
  4 使用附件的脚本,运行
  hbase org.jruby.Main ~/add_table.rb /hbase/if_fss_files
  5 启动新hbase
  ###hadoop迁移-hadoop数据迁移
  ########整理hadoop文件,对于打包失败的重新打包
  如2014-07-24执行
  ./hdfs dfs -rm -r /fss/2014-07-24
  ./hdfs dfs -rm -r /fss/2014-07-24.har
  ./hdfs dfs -mv /fss/2014-07-24a.har /fss/2014-07-24.har
  ##从远程fss系统同步到新机房本地
  ./hdfs dfs -copyToLocal hdfs://hadoop001.99bill.com:9000/fss/2015-04-08.har /opt/sdb/hadoop/tmp/
  ####从新机房本地导入fss系统
  ./hdfs dfs -copyFromLocal /opt/sdb/hadoop/tmp/2015-04-08.har /fss/
  sleep 5
  ./hdfs dfs -copyFromLocal /opt/sdb/hadoop/tmp/2015-06/03-30.har   /fss/2015-06


运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-628068-1-1.html 上篇帖子: Hadoop2.7实战v1.0之Linux参数调优 下篇帖子: Hadoop2.5.2 HA高可靠性集群搭建(Hadoop+Zookeeper)
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表