akyou56 发表于 2019-1-7 06:00:54

高可用集群之heartbeat基于crm进行资源管理(二)

  一、高可用集群之heartbeat基于crm进行资源管理
  1、集群的工作模型:
  A/P:两个节点,工作与主备模型
  N-M N>M,N个节点,M个服务
  N-N:N个节点,N个服务
  A/A:双主模型:
  

  2、资源转移的方式
  rgmanager:failover domain priority
  pacemaker:
  资源黏性:
  资源约束(三种类型):
  位置约束:资源更倾向于那个节点上
  inf:无穷大
  n:
  -n:
  -inf:负无穷
  排列约束:资源运行在同一节点的倾向性
  inf:
  -inf:
  顺序约束:资源的启动次序及关闭次序
  

  3、如何让web service中的三个资源:VIP、httpd和filesystem运行于同一节点上
  1.排列约束
  2.资源组(resource group)
  

  4、如果节点不在是集群节点成员时,如何处理运行于当前节点的资源
  stopped:停止
  ignore:忽略
  freeze:不连接新的请求
  suicide:将服务器kill
  

  5、一个资源刚配置完成时,是否启动
  target-role?
  

  6、RA类型
  heartbeat legacy
  LSB
  OCF
  STONITH
  7、资源类型
  primitive,native:主资源,只能运行于一个节点
  group:组资源
  clone:克隆资源
  总克隆数,每个节点最多可运行的克隆数
  stonith cluster filesystem
  master/salve:主从资源
  8、分布式锁:
  

  /usr/lib64/heartbeat
  hearsources2cib.py
  

  9、图形化配置
  ha.cf
  crm on

  

  /usr/lib64/heartbeat/ha_propagate 将配置文件传送到别的节点

  

  10、安装gui
  heartbeat v2使用crm作为ijiqun资源管理器:需要在ha.cf中添加

  crm on
  crm通过mgmtd集成监听5560/tcp
  需要启动hb_gui的主机为hacluster用户添加密码,使用hb_gui启动

  

  with quorum:拥有法定票数
  without quorum :不拥有法定票数
  

  11、定义高可用的web service
  VIP
  httpd
  

  from
  to:以它为基础
  

  

  web service
  VIP
  httpd
  NFS
  

  注意haresources与crm不兼容,不被crm所读取
  

  二、配置
  1、ha.cf
  # vim /etc/ha.d/ha.cf
  mcast eth0 225.0.100.19 694 1 0
  crm on
  

  # /usr/lib64/heartbeat/ha_propagate
  Propagating HA configuration files to node datanode4.abc.com.
  ha.cf                                       100%   10KB10.4KB/s   00:00
  authkeys                                    100%694   0.7KB/s   00:00
  Setting HA startup configuration on node datanode4.abc.com.
  

  2、注意haresources与crm不兼容,不被crm所读取
  # mv /etc/ha.d/haresources /root
  

  底下mv是datanode4的主机
  # mv haresources /root/
  

  # service heartbeat start
  logd is already running
  Starting High-Availability services:
  Done.
  

  # ssh datanode4 'service heartbeat start'
  logd is already running
  Starting High-Availability services:
  Done.
  

  3、查看日志
  # tail -f /var/log/messages
  Jun 19 16:00:29 snn crmd: : notice: populate_cib_nodes: Node: datanode4.abc.com (uuid: 0862d824-047e-4826-9e26-21a7603f53c8)
  Jun 19 16:00:30 snn crmd: : notice: populate_cib_nodes: Node: snn.abc.com (uuid: 6009ca6a-56eb-4d35-872e-3b8dc0fc9851)
  Jun 19 16:00:30 snn crmd: : info: do_ha_control: Connected to Heartbeat
  Jun 19 16:00:30 snn crmd: : info: do_ccm_control: CCM connection established... waiting for first callback
  Jun 19 16:00:30 snn crmd: : info: do_started: Delaying start, CCM (0000000000100000) not connected
  Jun 19 16:00:30 snn crmd: : info: crmd_init: Starting crmd's mainloop
  Jun 19 16:00:30 snn crmd: : notice: crmd_client_status_callback: Status update: Client snn.abc.com/crmd now has status
  Jun 19 16:00:30 snn crmd: : notice: crmd_client_status_callback: Status update: Client snn.abc.com/crmd now has status
  Jun 19 16:00:30 snn crmd: : notice: crmd_client_status_callback: Status update: Client datanode4.abc.com/crmd now has status
  Jun 19 16:00:30 snn cib: : info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
  Jun 19 16:00:30 snn cib: : info: mem_handle_event: instance=5, nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
  Jun 19 16:00:30 snn cib: : info: cib_ccm_msg_callback: PEER: datanode4.abc.com
  Jun 19 16:00:30 snn cib: : info: cib_ccm_msg_callback: PEER: snn.abc.com
  Jun 19 16:00:31 snn crmd: : info: do_started: Delaying start, CCM (0000000000100000) not connected
  Jun 19 16:00:31 snn crmd: : info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
  Jun 19 16:00:31 snn crmd: : info: mem_handle_event: instance=5, nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
  Jun 19 16:00:31 snn crmd: : info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=5)
  Jun 19 16:00:31 snn crmd: : info: ccm_event_detail: NEW MEMBERSHIP: trans=5, nodes=2, new=2, lost=0 n_idx=0, new_idx=0, old_idx=4
  Jun 19 16:00:31 snn crmd: : info: ccm_event_detail: #011CURRENT: datanode4.abc.com
  Jun 19 16:00:31 snn crmd: : info: ccm_event_detail: #011CURRENT: snn.abc.com
  Jun 19 16:00:31 snn crmd: : info: ccm_event_detail: #011NEW:   datanode4.abc.com
  Jun 19 16:00:31 snn crmd: : info: ccm_event_detail: #011NEW:   snn.abc.com
  Jun 19 16:00:31 snn crmd: : info: do_started: The local CRM is operational
  Jun 19 16:00:31 snn crmd: : info: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_CCM_CALLBACK origin=do_started ]
  

  4、查看集群监控状态
  //如果想它只显示一次使用crm_mon --one-shot
  # crm_mon
  Refresh in 6s...
  

  ============
  Last updated: Fri Jun 19 16:11:34 2015
  Current DC: snn.abc.com (6009ca6a-56eb-4d35-872e-3b8dc0fc9851)
  2 Nodes configured.
  0 Resources configured.
  ============
  

  Node: datanode4.abc.com (0862d824-047e-4826-9e26-21a7603f53c8): online
  Node: snn.abc.com (6009ca6a-56eb-4d35-872e-3b8dc0fc9851): online
  

  

  4、crm的命令工具
  # crm_sh
  /usr/sbin/crm_sh:31: DeprecationWarning: The popen2 module is deprecated.Use the subprocess module.
  from popen2 import Popen3
  crm # help
  Usage: crm (nodes|config|resources)
  crm # nodes
  crm nodes # help
  Usage: nodes (status|list)
  crm nodes # list
  
  
  crm nodes #
  

  5、安装heartbeat的时候自动创建一个用户hacluster,但没有密码,需要创建
  # cat /etc/passwd |grep hacluster
  hacluster:x:498:498:heartbeat user:/var/lib/heartbeat/cores/hacluster:/sbin/nologin
  

  # passwd hacluster
  更改用户 hacluster 的密码 。
  新的 密码:
  无效的密码: WAY 过短
  无效的密码: 过于简单
  重新输入新的 密码:
  passwd: 所有的身份验证令牌已经成功更新。
  

  6、直接运行hb_gui
  # hb_gui
  Traceback (most recent call last):
  File "/usr/bin/hb_gui", line 41, in
  import gtk, gtk.glade, gobject
  File "/usr/lib64/python2.6/site-packages/gtk-2.0/gtk/__init__.py", line 64, in
  _init()
  File "/usr/lib64/python2.6/site-packages/gtk-2.0/gtk/__init__.py", line 52, in _init
  _gtk.init_check()
  RuntimeError: could not open display
  以上有错误提示
http://s3.运维网.com/wyfs02/M02/6E/BB/wKiom1WD4kKjHhq7AAKYGhh-2CI498.jpg
  

  在客户端下载安装Xmanager即可
  在重执行命令

http://s3.运维网.com/wyfs02/M01/6E/BB/wKiom1WD5IqhUc9SAATt458E-Gk900.jpg
  

  三、ha_gui定义
  1、定义主资源名称
http://s3.运维网.com/wyfs02/M01/6E/BB/wKiom1WD6obRT9D7AAIGDmQ8Q0c090.jpg
http://s3.运维网.com/wyfs02/M02/6E/BB/wKiom1WD68-DYSRoAAMtZVcpp9A772.jpg
http://s3.运维网.com/wyfs02/M01/6E/BB/wKiom1WD7P3Doq-HAAOUjNleYHs777.jpg
http://s3.运维网.com/wyfs02/M02/6E/B8/wKioL1WD7rvSGA57AAKp1DNjKL0275.jpg
  

  2、继继定义主资源

http://s3.运维网.com/wyfs02/M01/6E/BC/wKiom1WD7zqSGPJRAAHxYAQLXVc864.jpg
http://s3.运维网.com/wyfs02/M02/6E/BC/wKiom1WD70ThbOgEAAMIoGhwXXQ196.jpg
http://s3.运维网.com/wyfs02/M00/6E/B8/wKioL1WD8QTS3DV8AAKFjBQlP-o714.jpg
http://s3.运维网.com/wyfs02/M02/6E/B9/wKioL1WD9YXSnFgFAAJQK879AXg158.jpg
  

  3、让两个资源运行同一个节点,方法有两种:(1)定义排列约束,(2)定义资源组
  (1)定义排列约束
http://s3.运维网.com/wyfs02/M01/6E/BC/wKiom1WD9heC77D6AAH_sfWkoJ0818.jpg
http://s3.运维网.com/wyfs02/M02/6E/B9/wKioL1WD99eiBjFNAAOlnl00Zjw580.jpg
http://s3.运维网.com/wyfs02/M00/6E/BC/wKiom1WD9oGgrfb2AAH4VGL2ac8178.jpg
http://s3.运维网.com/wyfs02/M00/6E/BC/wKiom1WD9t_QLYzqAAEawH1atdk598.jpg
  

  4、让snn节点成为备的
http://s3.运维网.com/wyfs02/M02/6E/BC/wKiom1WD98yS7sX-AAIv0RMduF4084.jpg
http://s3.运维网.com/wyfs02/M00/6E/B9/wKioL1WD-X-BQLiUAAJdCkZBs-s135.jpg
  

  

  四、定义组的方式
      web server:
  vip:192.168.1.8
  httpd
  nfs:/192.168.1.4:/web/htdocs挂在到/var/www/html
  1、删除原来主资源
http://s3.运维网.com/wyfs02/M02/6E/B9/wKioL1WD-_6BKl24AAHktqrP8ZI523.jpg
  2、定义群主源
http://s3.运维网.com/wyfs02/M02/6E/BA/wKioL1WECZSBUX-BAAITS2IhDXw499.jpg
http://s3.运维网.com/wyfs02/M02/6E/BD/wKiom1WEB-KBPjLnAAMH5qhE7XI203.jpg
http://s3.运维网.com/wyfs02/M00/6E/BA/wKioL1WECZXiY09nAAI9tZyRdEE589.jpg
http://s3.运维网.com/wyfs02/M00/6E/BD/wKiom1WEB-Oy5shJAAOr9bfv-Bg743.jpg
http://s3.运维网.com/wyfs02/M02/6E/BD/wKiom1WEB-OCBJnKAAMwMXL56jM083.jpg
http://s3.运维网.com/wyfs02/M01/6E/BA/wKioL1WECZbj9WRcAAJzBPG6uvc027.jpg
http://s3.运维网.com/wyfs02/M01/6E/BA/wKioL1WECg3QYMOeAALWGkxiE-c668.jpg
  

  3、httpd无法启动,查看日志如下
http://s3.运维网.com/wyfs02/M01/6E/BD/wKiom1WECJSzDz34AAtEM83nkEY497.jpg
  从日志 来看,nfs正常挂在到4这主机上,但httpd先启动后又关闭,奇怪了
  

  4、来到datanode4这台机子,单独启动httpd看看,没有成功
  # /etc/init.d/httpd restart
  停止 httpd:                                             [失败]
  正在启动 httpd:Syntax error on line 292 of /etc/httpd/conf/httpd.conf:
  

  5、查看SElinux状态,吓了一跳,问题出现在这里
  # getenforce
  Enforcing
  # setenforce 0

  # getenforce
  Permissive
  把配置文件改成disabled
  # vim /etc/selinux/config
  SELINUX=disabled
  

  6、单独在启动httpd看看
  # /etc/init.d/httpd start
  正在启动 httpd:                                           [确定]
  # /etc/init.d/httpd stop
  停止 httpd:
  

  7、再回到snn输入hb_gui看看,之前是webserivce,这不影响名称是可以随便定义,我之前删了,就重新建一资源为了好识别,就定义了httpd
http://s3.运维网.com/wyfs02/M02/6E/CF/wKiom1WIzPPwYZ-hAAKZsMNIKvc228.jpg
  

  五、验证
  1、nfs的index.html内容
  # cat /web/htdocs/index.html
  datanode.abc.com
  # ifconfig eth0
  eth0      Link encap:EthernetHWaddr 00:0C:29:50:AC:6E
  inet addr:192.168.1.4Bcast:192.168.1.255Mask:255.255.255.0
  inet6 addr: fe80::20c:29ff:fe50:ac6e/64 Scope:Link
  UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
  RX packets:83505 errors:0 dropped:0 overruns:0 frame:0
  TX packets:2037 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:7403212 (7.0 MiB)TX bytes:228350 (222.9 KiB)
  

  2、datanode4的主机的vip地址,如果单纯输入ifocnfig,不能显示出来的,它没有利用别名来定义,所以要用的ip addr show
  # ifconfig eth0

  eth0      Link encap:EthernetHWaddr 00:0C:29:E1:2F:66
  inet addr:192.168.1.6Bcast:192.168.1.255Mask:255.255.255.0
  inet6 addr: fe80::20c:29ff:fee1:2f66/64 Scope:Link
  UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
  RX packets:147365 errors:0 dropped:0 overruns:0 frame:0
  TX packets:66651 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:20284443 (19.3 MiB)TX bytes:14571080 (13.8 MiB)
  

  # ip addr show
  1: lo:mtu 65536 qdisc noqueue state UNKNOWN
  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
  inet 127.0.0.1/8 scope host lo
  inet6 ::1/128 scope host
  valid_lft forever preferred_lft forever
  2: eth0:mtu 1500 qdisc pfifo_fast state UP qlen 1000
  link/ether 00:0c:29:e1:2f:66 brd ff:ff:ff:ff:ff:ff
  inet 192.168.1.6/24 brd 192.168.1.255 scope global eth0
  inet 192.168.1.8/24 brd 192.168.1.255 scope global secondary eth0 //显示vip地址
  inet6 fe80::20c:29ff:fee1:2f66/64 scope link
  valid_lft forever preferred_lft forever
  


  3、在浏览器输入,注意这里输入的是vip地址

http://s3.运维网.com/wyfs02/M00/6E/CC/wKioL1WI0lyDPQPjAAEaLtP2p60235.jpg
  4、如果datanode4成为备用
  

http://s3.运维网.com/wyfs02/M01/6E/D0/wKiom1WI1U7yT1sqAAHvAZXbXAI916.jpg
  

  到snn主机上看,转移成功

  
  # ip addr show
  1: lo:mtu 16436 qdisc noqueue state UNKNOWN
  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
  inet 127.0.0.1/8 scope host lo
  inet6 ::1/128 scope host
  valid_lft forever preferred_lft forever
  2: eth0:mtu 1500 qdisc pfifo_fast state UP qlen 1000
  link/ether 00:0c:29:b1:89:48 brd ff:ff:ff:ff:ff:ff
  inet 192.168.1.5/24 brd 192.168.1.255 scope global eth0
  inet 192.168.1.8/24 brd 192.168.1.255 scope global secondary eth0
  inet6 fe80::20c:29ff:feb1:8948/64 scope link
  valid_lft forever preferred_lft forever
  

  
涮新浏览器,还是原来的内容
  http://s3.运维网.com/wyfs02/M01/6E/D0/wKiom1WI1iWB4tzcAAELNxoVsag863.jpg



页: [1]
查看完整版本: 高可用集群之heartbeat基于crm进行资源管理(二)