设为首页 收藏本站
查看: 771|回复: 0

[经验分享] hadoop 2.x-HDFS HA --Part II: installation

[复制链接]

尚未签到

发表于 2016-12-10 08:11:18 | 显示全部楼层 |阅读模式
  this article is the another part follow to hadoop 2.x-HDFS HA --Part I: abstraction ,and here will talk about these topics:

2.installation HA
2.1 manual failover  
2.2 auto failover
3.conclusion
   2.installation HA
    2.1 manual failover
  for this mode,the cluster  can be failovered by manually using some commands,but it's hard to know when/why this issuse occures.of course ,this is better than nothing:)
  here are the assignments my cluster:

hostNNDNJN
hd1yyy
hd2yyy
   yes,in general,the number of journalnode is recommanded to be a odd number for max utilization,but here is just for validaing the  function of HA,so it's passed also!
    upon on the configs in install hadoop-2.5 without HDFS HA /Federation ,there are some changes of properties to be added or alternated,

mode  property to be added
  /alternated 

valueabstract
HA manual failover
dfs.nameservices

myclusterlogic name of this name service,
 dfs.ha.namenodes.myclusternn1,nn2  the name is formated by:
  dfs.ha.namenodes.#serviceName#;
  and this name service contains two namenodes

 
dfs.namenode.rpc-address.mycluster.nn1


hd1:8020

the internal communication addr
 
dfs.namenode.rpc-address.mycluster.nn2


hd2:8020

 
 
dfs.namenode.http-address.mycluster.nn1

hd1:50070the web ui address
 
dfs.namenode.http-address.mycluster.nn2

hd2:50070 
 
dfs.namenode.shared.edits.dir


qjournal://hd1:8485;hd2:8485/mycluster

 
 
dfs.client.failover.proxy.provider.mycluster


org.apache.hadoop.hdfs.server.namenode.
ha.ConfiguredFailoverProxyProvider

 
 
dfs.ha.fencing.methods


sshfence

 
 
dfs.ha.fencing.ssh.private-key-files


/home/hadoop/.ssh/id_rsa

 
 
dfs.ha.fencing.ssh.connect-timeout



10000

 
 
fs.defaultFS


hdfs://mycluster

  the suffix of this value must be same
  as property 'dfs.nameservices' set in hdfs-site.xml

 
dfs.journalnode.edits.dir


/usr/local/hadoop/data-2.5.1/journal

 
   
  2.1.2 steps to startup
   these steps below are order-related.
   2..1.2.1 startup journalnode
     go to all journalnodes,and run 

sbin/hadoop-daemon.sh start journalnode
 2.1.2.2 go to first NN and format followed by start

hdfs namenode -format
hadoop-daemon.sh start namenode
   then go to the remain JN nodes,to get the fs image,run by

bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode
  2.1.2.3 spawn datanode 


sbin/hadoop-daemons.sh start datanode
  now,both namenodes are all kept in 'standby' state(yes,this is the defult action by manual mode,if you want to set a default active namenode ,use the auto-failover mode instead in this page)
  here,u can use some commands to transition standby to active and vice-versa

hadoop@ubuntu:/usr/local/hadoop/hadoop-2.5.1/etc/hadoop-ha-manual$ hdfs haadmin
Usage: DFSHAAdmin [-ns <nameserviceId>]
[-transitionToActive <serviceId> [--forceactive]]
[-transitionToStandby <serviceId>]
[-failover [--forcefence] [--forceactive] <serviceId> <serviceId>]
[-getServiceState <serviceId>]
[-checkHealth <serviceId>]
[-help <command>]
  a.transition a standby namenode nn1 To active ,return nothing if it's already active
hdfs haadmin -transitionToActive nn1
  b.check whether nn1 is in active state
hdfs haadmin -getServiceState nn1
    it will return active result  c.then will check its healthy

hdfs haadmin -checkHealth nn1
   return nothing if it's healthy,else some 'connection excpetion will show ' here  d.yes,u can also failover from a dead namenode to another one to get active

hdfs haadmin -failover nn1 nn2
   here switch state from nn1(active) to nn2(standby).if u specify optioin '--forcefence' then the namenode nn1 will be killed also for fencing!so this is prudent.  e.and stop-dfs.sh will shutdown all processes in cluster,and start-dfs.sh will spawn up all them

stop-dfs.sh
start-df.sh
hdfs haadmin -transitionToActive nn1
   by now ,we can see the nn1 is in active,and 'standby' for nn2
  
DSC0000.png
 
  
DSC0001.png
 
  below are the orders of stop and start:
  start order :

hadoop@ubuntu:/usr/local/hadoop/hadoop-2.5.1$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hd1 hd2]
hd1: starting namenode, logging to /usr/local/hadoop/hadoop-2.5.1/logs/hadoop-hadoop-namenode-ubuntu.out
hd2: starting namenode, logging to /usr/local/hadoop/hadoop-2.5.1/logs/hadoop-hadoop-namenode-bfadmin.out
hd1: datanode running as process 1539. Stop it first.
hd2: datanode running as process 3081. Stop it first.
Starting journal nodes [hd1 hd2]
hd1: journalnode running as process 1862. Stop it first.
hd2: starting journalnode, logging to /usr/local/hadoop/hadoop-2.5.1/logs/hadoop-hadoop-journalnode-bfadmin.out
Starting ZK Failover Controllers on NN hosts [hd1 hd2]
hd1: zkfc running as process 2090. Stop it first.
hd2: zkfc running as process 3388. Stop it first.
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/hadoop-2.5.1/logs/yarn-hadoop-resourcemanager-ubuntu.out
hd2: starting nodemanager, logging to /usr/local/hadoop/hadoop-2.5.1/logs/yarn-hadoop-nodemanager-bfadmin.out
hd1: starting nodemanager, logging to /usr/local/hadoop/hadoop-2.5.1/logs/yarn-hadoop-nodemanager-ubuntu.out
  shutdown order(same as starting):

hadoop@ubuntu:/usr/local/hadoop/hadoop-2.5.1$ stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [hd1 hd2]
hd1: stopping namenode
hd2: stopping namenode
hd1: stopping datanode
hd2: stopping datanode
Stopping journal nodes [hd1 hd2]
hd1: stopping journalnode
hd2: stopping journalnode
Stopping ZK Failover Controllers on NN hosts [hd1 hd2]
hd1: stopping zkfc
hd2: stopping zkfc
stopping yarn daemons
no resourcemanager to stop
hd1: no nodemanager to stop
hd2: no nodemanager to stop
no proxyserver to stop
  --------------
  now u can test some failover cases:

hdfs dfs -put test.txt /
kill #process-of-nn1#
hdfs haadmin -transitionToActive nn2
# test whether the first nn1 's edits are synchronised to nn2?yes of course,you will see the file lied there correctly
hdfs dfs -ls /
  ------------------
  below is a test of killing the journalnode to check the cluster's robust 
  after stop hd1's journalnode,this causes the hd1's(same host) namenode to be killed :

2014-11-12 16:45:15,102 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 10008ms to send a batch of 1 edits (17 bytes) to remote journal 192.168.1.25:8485
2014-11-12 16:45:15,102 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [192.168.1.25:8485, 192.168.1.30:8485], stream=QuorumOutputStream starting at txid 261))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/2. 1 successful responses:
192.168.1.30:8485: null [success]
1 exceptions thrown:
192.168.1.25:8485: Call From ubuntu/192.168.1.25 to hd1:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

  note the msg:Got too many exceptions to achieve quorum size 2/2. 1 successful responses....1 exceptions thrown:
  =============
    2.2 auto failover
  the so-called 'auto failover' is the opposite of 'manual failover',the former uses a coordination-system(ie.zookeeper) to automatically recover the namenodes if some failures occure,e.g hardware faults,soft ware bugs,etc.when these problems issue the HA will detect which namenode is failed from active(not standby).then the standby nn will undertake the role which prior was 'active'.
  here are some configs besides from manual failover's:

  property to be added
  /alternated 

valueabstract
  dfs.ha.automatic-failover.enabled

trueauto failover when possible
  ha.zookeeper.quorum









hd1:2181,hd2:2181yes ,u can see,both journalnode and zookeeperrole here are only even number,but it's ok for test also!
   
  and the zkfc (zk client used in namenode to detect failures) roles are like this:

hostnnjndnzkfc(new)
hd1yyyy
hd2yyyy
  2.2.1 steps to startup 
  a.format zk

hdfs zkfc -formatZK
  b.start all

start-dfs.sh
  this includes the nn,jn,dn and zkfc processes.
  now just do what u want to do,the auto failover will function properly,have a nice experience for that!
  ref:

HDFS High Availability Using the Quorum Journal Manager
  Hadoop 2.0 NameNode HA和Federation实践

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-312090-1-1.html 上篇帖子: 开源云计算技术系列(七)Cloudera (hadoop 0.20) 下篇帖子: Hadoop作业提交多种方案具体流程详解
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表