高可用集群之heartbeat基于crm进行资源管理(二)

akyou56 发表于 2019-1-7 06:00:54

　　一、高可用集群之heartbeat基于crm进行资源管理
　　1、集群的工作模型：
　　A/P：两个节点，工作与主备模型
　　N-M N>M，N个节点，M个服务
　　N-N：N个节点，N个服务
　　A/A：双主模型：
　　

　　2、资源转移的方式
　　rgmanager：failover domain priority
　　pacemaker：
　　资源黏性：
　　资源约束（三种类型）：
　　位置约束：资源更倾向于那个节点上
　　inf：无穷大
　　n:
　　-n:
　　-inf:负无穷
　　排列约束：资源运行在同一节点的倾向性
　　inf：
　　-inf：
　　顺序约束：资源的启动次序及关闭次序
　　

　　3、如何让web service中的三个资源：VIP、httpd和filesystem运行于同一节点上
　　1.排列约束
　　2.资源组（resource group）
　　

　　4、如果节点不在是集群节点成员时，如何处理运行于当前节点的资源
　　stopped：停止
　　ignore：忽略
　　freeze：不连接新的请求
　　suicide：将服务器kill
　　

　　5、一个资源刚配置完成时，是否启动
　　target-role？
　　

　　6、RA类型
　　heartbeat legacy
　　LSB
　　OCF
　　STONITH
　　7、资源类型
　　primitive，native：主资源，只能运行于一个节点
　　group：组资源
　　clone：克隆资源
　　总克隆数，每个节点最多可运行的克隆数
　　stonith cluster filesystem
　　master/salve：主从资源
　　8、分布式锁：
　　

　　/usr/lib64/heartbeat
　　hearsources2cib.py
　　

　　9、图形化配置
　　ha.cf
　　crm on

　　

　　/usr/lib64/heartbeat/ha_propagate 将配置文件传送到别的节点

　　

　　10、安装gui
　　heartbeat v2使用crm作为ijiqun资源管理器：需要在ha.cf中添加

　　crm on
　　crm通过mgmtd集成监听5560/tcp
　　需要启动hb_gui的主机为hacluster用户添加密码，使用hb_gui启动

　　

　　with quorum：拥有法定票数
　　without quorum ：不拥有法定票数
　　

　　11、定义高可用的web service
　　VIP
　　httpd
　　

　　from
　　to：以它为基础
　　

　　

　　web service
　　VIP
　　httpd
　　NFS
　　

　　注意haresources与crm不兼容，不被crm所读取
　　

　　二、配置
　　1、ha.cf
　　# vim /etc/ha.d/ha.cf
　　mcast eth0 225.0.100.19 694 1 0
　　crm on
　　

　　# /usr/lib64/heartbeat/ha_propagate
　　Propagating HA configuration files to node datanode4.abc.com.
　　ha.cf                                     100% 10KB10.4KB/s 00:00
　　authkeys                                  100%694 0.7KB/s 00:00
　　Setting HA startup configuration on node datanode4.abc.com.
　　

　　2、注意haresources与crm不兼容，不被crm所读取
　　# mv /etc/ha.d/haresources /root
　　

　　底下mv是datanode4的主机
　　# mv haresources /root/
　　

　　# service heartbeat start
　　logd is already running
　　Starting High-Availability services:
　　Done.
　　

　　# ssh datanode4 'service heartbeat start'
　　logd is already running
　　Starting High-Availability services:
　　Done.
　　

　　3、查看日志
　　# tail -f /var/log/messages
　　Jun 19 16:00:29 snn crmd: : notice: populate_cib_nodes: Node: datanode4.abc.com (uuid: 0862d824-047e-4826-9e26-21a7603f53c8)
　　Jun 19 16:00:30 snn crmd: : notice: populate_cib_nodes: Node: snn.abc.com (uuid: 6009ca6a-56eb-4d35-872e-3b8dc0fc9851)
　　Jun 19 16:00:30 snn crmd: : info: do_ha_control: Connected to Heartbeat
　　Jun 19 16:00:30 snn crmd: : info: do_ccm_control: CCM connection established... waiting for first callback
　　Jun 19 16:00:30 snn crmd: : info: do_started: Delaying start, CCM (0000000000100000) not connected
　　Jun 19 16:00:30 snn crmd: : info: crmd_init: Starting crmd's mainloop
　　Jun 19 16:00:30 snn crmd: : notice: crmd_client_status_callback: Status update: Client snn.abc.com/crmd now has status
　　Jun 19 16:00:30 snn crmd: : notice: crmd_client_status_callback: Status update: Client snn.abc.com/crmd now has status
　　Jun 19 16:00:30 snn crmd: : notice: crmd_client_status_callback: Status update: Client datanode4.abc.com/crmd now has status
　　Jun 19 16:00:30 snn cib: : info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
　　Jun 19 16:00:30 snn cib: : info: mem_handle_event: instance=5, nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
　　Jun 19 16:00:30 snn cib: : info: cib_ccm_msg_callback: PEER: datanode4.abc.com
　　Jun 19 16:00:30 snn cib: : info: cib_ccm_msg_callback: PEER: snn.abc.com
　　Jun 19 16:00:31 snn crmd: : info: do_started: Delaying start, CCM (0000000000100000) not connected
　　Jun 19 16:00:31 snn crmd: : info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
　　Jun 19 16:00:31 snn crmd: : info: mem_handle_event: instance=5, nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
　　Jun 19 16:00:31 snn crmd: : info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=5)
　　Jun 19 16:00:31 snn crmd: : info: ccm_event_detail: NEW MEMBERSHIP: trans=5, nodes=2, new=2, lost=0 n_idx=0, new_idx=0, old_idx=4
　　Jun 19 16:00:31 snn crmd: : info: ccm_event_detail: #011CURRENT: datanode4.abc.com
　　Jun 19 16:00:31 snn crmd: : info: ccm_event_detail: #011CURRENT: snn.abc.com
　　Jun 19 16:00:31 snn crmd: : info: ccm_event_detail: #011NEW: datanode4.abc.com
　　Jun 19 16:00:31 snn crmd: : info: ccm_event_detail: #011NEW: snn.abc.com
　　Jun 19 16:00:31 snn crmd: : info: do_started: The local CRM is operational
　　Jun 19 16:00:31 snn crmd: : info: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_CCM_CALLBACK origin=do_started ]
　　

　　4、查看集群监控状态
　　//如果想它只显示一次使用crm_mon --one-shot
　　# crm_mon
　　Refresh in 6s...
　　

　　============
　　Last updated: Fri Jun 19 16:11:34 2015
　　Current DC: snn.abc.com (6009ca6a-56eb-4d35-872e-3b8dc0fc9851)
　　2 Nodes configured.
　　0 Resources configured.
　　============
　　

　　Node: datanode4.abc.com (0862d824-047e-4826-9e26-21a7603f53c8): online
　　Node: snn.abc.com (6009ca6a-56eb-4d35-872e-3b8dc0fc9851): online
　　

　　

　　4、crm的命令工具
　　# crm_sh
　　/usr/sbin/crm_sh:31: DeprecationWarning: The popen2 module is deprecated.Use the subprocess module.
　　from popen2 import Popen3
　　crm # help
　　Usage: crm (nodes|config|resources)
　　crm # nodes
　　crm nodes # help
　　Usage: nodes (status|list)
　　crm nodes # list
　　
　　
　　crm nodes #
　　

　　5、安装heartbeat的时候自动创建一个用户hacluster，但没有密码，需要创建
　　# cat /etc/passwd ｜grep hacluster
　　hacluster:x:498:498:heartbeat user:/var/lib/heartbeat/cores/hacluster:/sbin/nologin
　　

　　# passwd hacluster
　　更改用户 hacluster 的密码。
　　新的密码：
　　无效的密码： WAY 过短
　　无效的密码：过于简单
　　重新输入新的密码：
　　passwd：所有的身份验证令牌已经成功更新。
　　

　　6、直接运行hb_gui
　　# hb_gui
　　Traceback (most recent call last):
　　File "/usr/bin/hb_gui", line 41, in
　　import gtk, gtk.glade, gobject
　　File "/usr/lib64/python2.6/site-packages/gtk-2.0/gtk/__init__.py", line 64, in
　　_init()
　　File "/usr/lib64/python2.6/site-packages/gtk-2.0/gtk/__init__.py", line 52, in _init
　　_gtk.init_check()
　　RuntimeError: could not open display
　　以上有错误提示
http://s3.运维网.com/wyfs02/M02/6E/BB/wKiom1WD4kKjHhq7AAKYGhh-2CI498.jpg
　　

　　在客户端下载安装Xmanager即可
　　在重执行命令

http://s3.运维网.com/wyfs02/M01/6E/BB/wKiom1WD5IqhUc9SAATt458E-Gk900.jpg
　　

　　三、ha_gui定义
　　1、定义主资源名称
http://s3.运维网.com/wyfs02/M01/6E/BB/wKiom1WD6obRT9D7AAIGDmQ8Q0c090.jpg
http://s3.运维网.com/wyfs02/M02/6E/BB/wKiom1WD68-DYSRoAAMtZVcpp9A772.jpg
http://s3.运维网.com/wyfs02/M01/6E/BB/wKiom1WD7P3Doq-HAAOUjNleYHs777.jpg
http://s3.运维网.com/wyfs02/M02/6E/B8/wKioL1WD7rvSGA57AAKp1DNjKL0275.jpg
　　

　　2、继继定义主资源

http://s3.运维网.com/wyfs02/M01/6E/BC/wKiom1WD7zqSGPJRAAHxYAQLXVc864.jpg
http://s3.运维网.com/wyfs02/M02/6E/BC/wKiom1WD70ThbOgEAAMIoGhwXXQ196.jpg
http://s3.运维网.com/wyfs02/M00/6E/B8/wKioL1WD8QTS3DV8AAKFjBQlP-o714.jpg
http://s3.运维网.com/wyfs02/M02/6E/B9/wKioL1WD9YXSnFgFAAJQK879AXg158.jpg
　　

　　3、让两个资源运行同一个节点，方法有两种：（1）定义排列约束，（2）定义资源组
　　（1）定义排列约束
http://s3.运维网.com/wyfs02/M01/6E/BC/wKiom1WD9heC77D6AAH_sfWkoJ0818.jpg
http://s3.运维网.com/wyfs02/M02/6E/B9/wKioL1WD99eiBjFNAAOlnl00Zjw580.jpg
http://s3.运维网.com/wyfs02/M00/6E/BC/wKiom1WD9oGgrfb2AAH4VGL2ac8178.jpg
http://s3.运维网.com/wyfs02/M00/6E/BC/wKiom1WD9t_QLYzqAAEawH1atdk598.jpg
　　

　　4、让snn节点成为备的
http://s3.运维网.com/wyfs02/M02/6E/BC/wKiom1WD98yS7sX-AAIv0RMduF4084.jpg
http://s3.运维网.com/wyfs02/M00/6E/B9/wKioL1WD-X-BQLiUAAJdCkZBs-s135.jpg
　　

　　

　　四、定义组的方式
　　 web server:
　　vip:192.168.1.8
　　httpd
　　nfs:/192.168.1.4:/web/htdocs挂在到/var/www/html
　　1、删除原来主资源
http://s3.运维网.com/wyfs02/M02/6E/B9/wKioL1WD-_6BKl24AAHktqrP8ZI523.jpg
　　2、定义群主源
http://s3.运维网.com/wyfs02/M02/6E/BA/wKioL1WECZSBUX-BAAITS2IhDXw499.jpg
http://s3.运维网.com/wyfs02/M02/6E/BD/wKiom1WEB-KBPjLnAAMH5qhE7XI203.jpg
http://s3.运维网.com/wyfs02/M00/6E/BA/wKioL1WECZXiY09nAAI9tZyRdEE589.jpg
http://s3.运维网.com/wyfs02/M00/6E/BD/wKiom1WEB-Oy5shJAAOr9bfv-Bg743.jpg
http://s3.运维网.com/wyfs02/M02/6E/BD/wKiom1WEB-OCBJnKAAMwMXL56jM083.jpg
http://s3.运维网.com/wyfs02/M01/6E/BA/wKioL1WECZbj9WRcAAJzBPG6uvc027.jpg
http://s3.运维网.com/wyfs02/M01/6E/BA/wKioL1WECg3QYMOeAALWGkxiE-c668.jpg
　　

　　3、httpd无法启动，查看日志如下
http://s3.运维网.com/wyfs02/M01/6E/BD/wKiom1WECJSzDz34AAtEM83nkEY497.jpg
　　从日志来看，nfs正常挂在到4这主机上，但httpd先启动后又关闭，奇怪了
　　

　　4、来到datanode4这台机子，单独启动httpd看看，没有成功
　　# /etc/init.d/httpd restart
　　停止 httpd：                                           [失败]
　　正在启动 httpd：Syntax error on line 292 of /etc/httpd/conf/httpd.conf:
　　

　　5、查看SElinux状态，吓了一跳，问题出现在这里
　　# getenforce
　　Enforcing
　　# setenforce 0

　　# getenforce
　　Permissive
　　把配置文件改成disabled
　　# vim /etc/selinux/config
　　SELINUX=disabled
　　

　　6、单独在启动httpd看看
　　# /etc/init.d/httpd start
　　正在启动 httpd：                                        [确定]
　　# /etc/init.d/httpd stop
　　停止 httpd：
　　

　　7、再回到snn输入hb_gui看看，之前是webserivce，这不影响名称是可以随便定义，我之前删了，就重新建一资源为了好识别，就定义了httpd
http://s3.运维网.com/wyfs02/M02/6E/CF/wKiom1WIzPPwYZ-hAAKZsMNIKvc228.jpg
　　

　　五、验证
　　1、nfs的index.html内容
　　# cat /web/htdocs/index.html
　　datanode.abc.com
　　# ifconfig eth0
　　eth0    Link encap:EthernetHWaddr 00:0C:29:50:AC:6E
　　inet addr:192.168.1.4Bcast:192.168.1.255Mask:255.255.255.0
　　inet6 addr: fe80::20c:29ff:fe50:ac6e/64 Scope:Link
　　UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
　　RX packets:83505 errors:0 dropped:0 overruns:0 frame:0
　　TX packets:2037 errors:0 dropped:0 overruns:0 carrier:0
　　collisions:0 txqueuelen:1000
　　RX bytes:7403212 (7.0 MiB)TX bytes:228350 (222.9 KiB)
　　

　　2、datanode4的主机的vip地址，如果单纯输入ifocnfig，不能显示出来的，它没有利用别名来定义，所以要用的ip addr show
　　# ifconfig eth0

　　eth0    Link encap:EthernetHWaddr 00:0C:29:E1:2F:66
　　inet addr:192.168.1.6Bcast:192.168.1.255Mask:255.255.255.0
　　inet6 addr: fe80::20c:29ff:fee1:2f66/64 Scope:Link
　　UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
　　RX packets:147365 errors:0 dropped:0 overruns:0 frame:0
　　TX packets:66651 errors:0 dropped:0 overruns:0 carrier:0
　　collisions:0 txqueuelen:1000
　　RX bytes:20284443 (19.3 MiB)TX bytes:14571080 (13.8 MiB)
　　

　　# ip addr show
　　1: lo:mtu 65536 qdisc noqueue state UNKNOWN
　　link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
　　inet 127.0.0.1/8 scope host lo
　　inet6 ::1/128 scope host
　　valid_lft forever preferred_lft forever
　　2: eth0:mtu 1500 qdisc pfifo_fast state UP qlen 1000
　　link/ether 00:0c:29:e1:2f:66 brd ff:ff:ff:ff:ff:ff
　　inet 192.168.1.6/24 brd 192.168.1.255 scope global eth0
　　inet 192.168.1.8/24 brd 192.168.1.255 scope global secondary eth0 //显示vip地址
　　inet6 fe80::20c:29ff:fee1:2f66/64 scope link
　　valid_lft forever preferred_lft forever
　　

　　3、在浏览器输入，注意这里输入的是vip地址

http://s3.运维网.com/wyfs02/M00/6E/CC/wKioL1WI0lyDPQPjAAEaLtP2p60235.jpg
　　4、如果datanode4成为备用
　　

http://s3.运维网.com/wyfs02/M01/6E/D0/wKiom1WI1U7yT1sqAAHvAZXbXAI916.jpg
　　

　　到snn主机上看，转移成功

　　
　　# ip addr show
　　1: lo:mtu 16436 qdisc noqueue state UNKNOWN
　　link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
　　inet 127.0.0.1/8 scope host lo
　　inet6 ::1/128 scope host
　　valid_lft forever preferred_lft forever
　　2: eth0:mtu 1500 qdisc pfifo_fast state UP qlen 1000
　　link/ether 00:0c:29:b1:89:48 brd ff:ff:ff:ff:ff:ff
　　inet 192.168.1.5/24 brd 192.168.1.255 scope global eth0
　　inet 192.168.1.8/24 brd 192.168.1.255 scope global secondary eth0
　　inet6 fe80::20c:29ff:feb1:8948/64 scope link
　　valid_lft forever preferred_lft forever
　　

　　
涮新浏览器，还是原来的内容
　　http://s3.运维网.com/wyfs02/M01/6E/D0/wKiom1WI1iWB4tzcAAELNxoVsag863.jpg

页: [1]

运维网's Archiver

高可用集群之heartbeat基于crm进行资源管理(二)