设为首页 收藏本站
查看: 2703|回复: 0

[经验分享] Redhat 6配置RHCS实现双机HA群集

[复制链接]

尚未签到

发表于 2018-5-9 12:10:41 | 显示全部楼层 |阅读模式
  

  最近测试了RedHat 6.5上RHCS,搭建了一个双机HA群集,在此将配置过程和测试过程分享给大家,主要包括节点配置、群集管理服务器配置、群集创建与配置、群集测试等内容。
  

  一、测试环境
计算机名

操作系统

IP地址

群集IP

安装的软件包

HAmanager
RedHat 6.5
192.168.10.150
     -luciiscsi target(用于仲裁盘)
node1
RedHat 6.5
192.168.10.104


192.168.10.103
High Availability、httpd
node2
RedHat 6.5
192.168.10.105
High Availability、httpd
  

  二、节点配置
  1、在三台机分别配置hosts互相解析
[root@HAmanager ~]# cat /etc/hosts

  192.168.10.104 node1 node1.localdomain
  192.168.10.105 node2 node2.localdomain
  192.168.10.150 HAmanager HAmanager.localdomain
  

[root@node1 ~]# cat /etc/hosts

  192.168.10.104 node1 node1.localdomain
  192.168.10.105 node2 node2.localdomain
  192.168.10.150 HAmanager HAmanager.localdomain
  

[root@node2 ~]# cat /etc/hosts

  192.168.10.104 node1 node1.localdomain
  192.168.10.105 node2 node2.localdomain
  192.168.10.150 HAmanager HAmanager.localdomain


2、在三台机分别配置SSH互信

[root@HAmanager ~]# ssh-keygen -t rsa

[root@HAmanager ~]# ssh-copy-id -i node1

  

[root@node1 ~]# ssh-keygen -t rsa

[root@node1 ~]# ssh-copy-id -i node2

  

[root@node2 ~]# ssh-keygen -t rsa

[root@node2 ~]# ssh-copy-id -i node1





3、两个节点关闭NetworkManager和acpid服务

[root@node1 ~]# service NetworkManager stop

[root@node1 ~]# chkconfig NetworkManager off

[root@node1 ~]# service acpid stop

[root@node1 ~]# chkconfig acpid off

  

[root@node2 ~]# service NetworkManager stop

[root@node2 ~]# chkconfig NetworkManager off

[root@node2 ~]# service acpid stop

[root@node2 ~]# chkconfig acpid off



  4、两个节点配置本地yum源
[root@node1 ~]# cat/etc/yum.repos.d/rhel6.5.repo

[Server]

  name=base
  baseurl=file:///mnt/
  enabled=1
  gpgcheck=0
[HighAvailability]

  name=base
  baseurl=file:///mnt/HighAvailability
  enabled=1
  gpgcheck=0
  

[root@node2 ~]# cat/etc/yum.repos.d/rhel6.5.repo

[Server]

  name=base
  baseurl=file:///mnt/
  enabled=1
  gpgcheck=0
[HighAvailability]

  name=base
  baseurl=file:///mnt/HighAvailability
  enabled=1
  gpgcheck=0


  5、两个节点分别安装群集软件包
[root@node1 ~]# yum groupinstall 'High Availability' –y

  Installed:
  ccs.x86_64 0:0.16.2-69.el6                cman.x86_640:3.0.12.1-59.el6
  omping.x86_64 0:0.0.4-1.el6               rgmanager.x86_640:3.0.12.1-19.el6
Dependency Installed:
  cifs-utils.x86_640:4.8.1-19.el6            clusterlib.x86_64 0:3.0.12.1-59.el6
  corosync.x86_640:1.4.1-17.el6              corosynclib.x86_64 0:1.4.1-17.el6
  cyrus-sasl-md5.x86_640:2.1.23-13.el6_3.1     fence-agents.x86_64 0:3.1.5-35.el6
  fence-virt.x86_640:0.2.3-15.el6             gnutls-utils.x86_64 0:2.8.5-10.el6_4.2
  ipmitool.x86_64 0:1.8.11-16.el6              keyutils.x86_640:1.4-4.el6
  libevent.x86_640:1.4.13-4.el6              libgssglue.x86_64 0:0.1-11.el6
  libibverbs.x86_640:1.1.7-1.el6               librdmacm.x86_64 0:1.0.17-1.el6
  libtirpc.x86_640:0.2.1-6.el6_4              libvirt-client.x86_64 0:0.10.2-29.el6
  lm_sensors-libs.x86_640:3.1.1-17.el6          modcluster.x86_640:0.16.2-28.el6
  nc.x86_64 0:1.84-22.el6                    net-snmp-libs.x86_64 1:5.5-49.el6
  net-snmp-utils.x86_641:5.5-49.el6             nfs-utils.x86_641:1.2.3-39.el6
  nfs-utils-lib.x86_640:1.1.5-6.el6             numactl.x86_640:2.0.7-8.el6
  oddjob.x86_64 0:0.30-5.el6                  openais.x86_64 0:1.1.1-7.el6
  openaislib.x86_640:1.1.1-7.el6               perl-Net-Telnet.noarch 0:3.03-11.el6
  pexpect.noarch 0:2.3-6.el6                  python-suds.noarch0:0.4.1-3.el6
  quota.x86_64 1:3.17-20.el6                 resource-agents.x86_640:3.9.2-40.el6
  ricci.x86_64 0:0.16.2-69.el6                 rpcbind.x86_640:0.2.0-11.el6
  sg3_utils.x86_64 0:1.28-5.el6                tcp_wrappers.x86_640:7.6-57.el6
  telnet.x86_64 1:0.17-47.el6_3.1              yajl.x86_64 0:1.0.7-3.el6

Complete!
[root@node2 ~]# yum groupinstall 'High Availability' –y

  Installed:
  ccs.x86_64 0:0.16.2-69.el6                cman.x86_640:3.0.12.1-59.el6
  omping.x86_64 0:0.0.4-1.el6               rgmanager.x86_640:3.0.12.1-19.el6
  Dependency Installed:
  cifs-utils.x86_640:4.8.1-19.el6            clusterlib.x86_64 0:3.0.12.1-59.el6
  corosync.x86_640:1.4.1-17.el6              corosynclib.x86_64 0:1.4.1-17.el6
  cyrus-sasl-md5.x86_640:2.1.23-13.el6_3.1      fence-agents.x86_64 0:3.1.5-35.el6
  fence-virt.x86_640:0.2.3-15.el6            gnutls-utils.x86_64 0:2.8.5-10.el6_4.2
  ipmitool.x86_64 0:1.8.11-16.el6              keyutils.x86_640:1.4-4.el6
  libevent.x86_640:1.4.13-4.el6              libgssglue.x86_64 0:0.1-11.el6
  libibverbs.x86_640:1.1.7-1.el6              librdmacm.x86_64 0:1.0.17-1.el6
  libtirpc.x86_640:0.2.1-6.el6_4             libvirt-client.x86_64 0:0.10.2-29.el6
  lm_sensors-libs.x86_640:3.1.1-17.el6          modcluster.x86_640:0.16.2-28.el6
  nc.x86_64 0:1.84-22.el6                   net-snmp-libs.x86_64 1:5.5-49.el6
  net-snmp-utils.x86_641:5.5-49.el6            nfs-utils.x86_641:1.2.3-39.el6
  nfs-utils-lib.x86_640:1.1.5-6.el6            numactl.x86_640:2.0.7-8.el6
  oddjob.x86_64 0:0.30-5.el6                 openais.x86_64 0:1.1.1-7.el6
  openaislib.x86_640:1.1.1-7.el6               perl-Net-Telnet.noarch 0:3.03-11.el6
  pexpect.noarch 0:2.3-6.el6                 python-suds.noarch0:0.4.1-3.el6
  quota.x86_64 1:3.17-20.el6                 resource-agents.x86_640:3.9.2-40.el6
  ricci.x86_64 0:0.16.2-69.el6                 rpcbind.x86_640:0.2.0-11.el6
  sg3_utils.x86_64 0:1.28-5.el6                tcp_wrappers.x86_640:7.6-57.el6
  telnet.x86_64 1:0.17-47.el6_3.1               yajl.x86_64 0:1.0.7-3.el6
  Complete!
  

  6、两个节点分别启动群集服务
[root@node1 ~]# service ricci start

[root@node1 ~]# chkconfig ricci on

[root@node1 ~]# chkconfig cman on

[root@node1 ~]# chkconfig rgmanager on

  

[root@node2 ~]# service ricci start

[root@node2 ~]# chkconfig ricci on

[root@node2 ~]# chkconfig cman on

[root@node2 ~]# chkconfig rgmanager on

  

  7、两个节点分别配置ricci密码
[root@node1 ~]# passwd ricci

  New password:
  BAD PASSWORD: it is too short
  BAD PASSWORD: is too simple
  Retype new password:
  passwd: all authentication tokens updated successfully.
  

[root@node2 ~]# passwd ricci

  New password:
  BAD PASSWORD: it is too short
  BAD PASSWORD: is too simple
  Retype new password:
  passwd: all authentication tokens updated successfully.


   8、两个节点分别安装httpd服务,方便后面测试应用的高可用性
[root@node1 ~]# yum -y install httpd

  

[root@node1 ~]# echo"This is Node1" > /var/www/html/index.html
DSC0000.png
[root@node2 ~]# yum -y install httpd

  [root@node2 ~]# echo"This is Node2" > /var/www/html/index.html
   DSC0001.png
  二、群集管理服务器配置
  1、在群集管理服务器安装luci软件包
[root@HAmanager ~]#yum -y install luci

  Installed:
  luci.x86_64 0:0.26.0-48.el6
  Dependency Installed:
  TurboGears2.noarch 0:2.0.3-4.el6
  python-babel.noarch 0:0.9.4-5.1.el6
  python-beaker.noarch 0:1.3.1-7.el6
  python-cheetah.x86_64 0:2.4.1-1.el6
  python-decorator.noarch 0:3.0.1-3.1.el6
  python-decoratortools.noarch 0:1.7-4.1.el6
  python-formencode.noarch 0:1.2.2-2.1.el6
  python-genshi.x86_64 0:0.5.1-7.1.el6
  python-mako.noarch 0:0.3.4-1.el6
  python-markdown.noarch 0:2.0.1-3.1.el6
  python-markupsafe.x86_64 0:0.9.2-4.el6
  python-myghty.noarch 0:1.1-11.el6
  python-nose.noarch 0:0.10.4-3.1.el6
  python-paste.noarch 0:1.7.4-2.el6
  python-paste-deploy.noarch 0:1.3.3-2.1.el6
  python-paste-script.noarch 0:1.7.3-5.el6_3
  python-peak-rules.noarch 0:0.5a1.dev-9.2582.1.el6
  python-peak-util-addons.noarch 0:0.6-4.1.el6
  python-peak-util-assembler.noarch 0:0.5.1-1.el6
  python-peak-util-extremes.noarch 0:1.1-4.1.el6
  python-peak-util-symbols.noarch 0:1.0-4.1.el6
  python-prioritized-methods.noarch 0:0.2.1-5.1.el6
  python-pygments.noarch 0:1.1.1-1.el6
  python-pylons.noarch 0:0.9.7-2.el6
  python-repoze-tm2.noarch 0:1.0-0.5.a4.el6
  python-repoze-what.noarch 0:1.0.8-6.el6
  python-repoze-what-pylons.noarch 0:1.0-4.el6
  python-repoze-who.noarch 0:1.0.18-1.el6
  python-repoze-who-friendlyform.noarch 0:1.0-0.3.b3.el6
  python-repoze-who-testutil.noarch 0:1.0-0.4.rc1.el6
  python-routes.noarch 0:1.10.3-2.el6
  python-setuptools.noarch 0:0.6.10-3.el6
  python-sqlalchemy.noarch 0:0.5.5-3.el6_2
  python-tempita.noarch 0:0.4-2.el6
  python-toscawidgets.noarch 0:0.9.8-1.el6
  python-transaction.noarch 0:1.0.1-1.el6
  python-turbojson.noarch 0:1.2.1-8.1.el6
  python-weberror.noarch 0:0.10.2-2.el6
  python-webflash.noarch 0:0.1-0.2.a9.el6
  python-webhelpers.noarch 0:0.6.4-4.el6
  python-webob.noarch 0:0.9.6.1-3.el6
  python-webtest.noarch 0:1.2-2.el6
  python-zope-filesystem.x86_64 0:1-5.el6
  python-zope-interface.x86_64 0:3.5.2-2.1.el6
  python-zope-sqlalchemy.noarch 0:0.4-3.el6
  Complete!
[root@HAmanager ~]#

  

  2、启动luci服务
[root@HAmanager ~]# service luci start

  Adding following auto-detected host IDs (IP addresses/domain names), corresponding to `HAmanager.localdomain' address, to the configuration of self-managed certificate `/var/lib/luci/etc/cacert.config' (you can change them by editing `/var/lib/luci/etc/cacert.config', removing the generated certificate `/var/lib/luci/certs/host.pem' and restarting luci):
  (none suitable found, you can still do it manually as mentioned above)
  

  Generating a 2048 bit RSA private key
  writing new private key to '/var/lib/luci/certs/host.pem'
  Starting saslauthd: [  OK  ]
  Start luci...
  Point your web browser to https://HAmanager.localdomain:8084 (or equivalent) to access luci
[root@HAmanager ~]# chkconfig luci on

  江健龙的技术博客http://jiangjianlong.blog.51cto.com/3735273/1931499
  

  三、创建和配置群集
  1、使用浏览器访问HA的web管理界面https://192.168.10.150:8084
DSC0002.png

DSC0003.png

  

  2、创建群集并添加节点至群集中
DSC0004.png

DSC0005.png

DSC0006.png

DSC0007.png

DSC0008.png

  

  3、添加vCenter为fence设备
DSC0009.png

  

  4、查找节点的虚拟机UUID
[root@node1 ~]# fence_vmware_soap -a 192.168.10.91 -z -l administrator@vsphere.local -p P@ssw0rd -o list

  node1,564df192-7755-9cd6-8a8b-45d6d74eabbb
  node2,564df4ed-cda1-6383-bbf5-f99807416184
  

  5、两个节点添加fence方法和实例
DSC00010.png

  

DSC00011.png

  

DSC00012.png

  

  6、查看fence设备状态
[root@node1 ~]# fence_vmware_soap -a 192.168.10.91 -z -l administrator@vsphere.local -p P@ssw0rd -o status

  Status: ON
  

  7、测试fence设备
[root@node2 ~]# fence_check

  fence_check run at Tue May 23 09:41:30 CST 2017 pid: 3455
  Testing node1.localdomain method 1: success
  Testing node2.localdomain method 1: success
  

  8、创建故障域
DSC00013.png

DSC00014.png

  

  9、添加群集资源,分别添加IP地址和脚本为群集资源
DSC00015.png

DSC00016.png

DSC00017.png

DSC00018.png

  

  10、创建群集服务组并添加已有的资源
DSC00019.png

DSC00020.png

DSC00021.png

DSC00022.png

  

  11、配置仲裁盘,在HAmanager服务器安装iSCSI target服务并创建一块100M的共享磁盘给两个节点
[root@HAmanager ~]#yum install scsi-target-utils -y

[root@HAmanager ~]#dd if=/dev/zero of=/iSCSIdisk/100m.img bs=1M seek=100 count=0

[root@HAmanager ~]#vi /etc/tgt/targets.conf

  <target iqn.2016-08.disk.rh6:disk100m>
  backing-store /iSCSIdisk/100m.img
  initiator-address 192.168.10.104    #for node1
  initiator-address 192.168.10.105    #for node2
  </target>
[root@HAmanager ~]#service tgtd start

[root@HAmanager ~]#chkconfig tgtd on

[root@HAmanager ~]#tgt-admin –show

  Target 1: iqn.2016-08.disk.rh6:disk100m
  System information:
  Driver: iscsi
  State: ready
  I_T nexus information:
  LUN information:
  LUN: 0
  Type: controller
  SCSI ID: IET     00010000
  SCSI SN: beaf10
  Size: 0 MB, Block size: 1
  Online: Yes
  Removable media: No
  Prevent removal: No
  Readonly: No
  Backing store type: null
  Backing store path: None
  Backing store flags:
  LUN: 1
  Type: disk
  SCSI ID: IET     00010001
  SCSI SN: beaf11
  Size: 105 MB, Block size: 512
  Online: Yes
  Removable media: No
  Prevent removal: No
  Readonly: No
  Backing store type: rdwr
  Backing store path: /sharedisk/100m.img
  Backing store flags:
  Account information:
  ACL information:
  192.168.10.104
  192.168.10.105
[root@HAmanager ~]#

  

  12、两个节点安装iscsi-initiator-utils并登录iscsi目标

[root@node1 ~]# yum install iscsi-initiator-utils

[root@node1 ~]# chkconfig iscsid on

[root@node1 ~]# iscsiadm -m discovery -t sendtargets -p 192.168.10.150

[root@node1 ~]# iscsiadm -m node

[root@node1 ~]# iscsiadm -m node -T iqn.2016-08.disk.rh6:disk100m --login

  

[root@node2 ~]# yum install iscsi-initiator-utils

[root@node2 ~]# chkconfig iscsid on

[root@node2 ~]# iscsiadm -m discovery -t sendtargets -p 192.168.10.150

[root@node2 ~]# iscsiadm -m node

[root@node2 ~]# iscsiadm -m node -T iqn.2016-08.disk.rh6:disk100m --login

  

  13、在节点一将共享磁盘/dev/sdb创建分区sdb1

[root@node1 ~]# fdisk /dev/sdb

  然后创建成sdb1
[root@node1 ~]# partprobe /dev/sdb1

  

  14、在节点一将sdb1创建成仲裁盘
[root@node1 ~]# mkqdisk -c /dev/sdb1 -l testqdisk

  mkqdisk v3.0.12.1
  Writing new quorum disk label 'testqdisk' to /dev/sdb1.
  WARNING: About to destroy all data on /dev/sdb1; proceed [N/y] ? y
  Initializing status block for node 1...
  Initializing status block for node 2...
  Initializing status block for node 3...
  Initializing status block for node 4...
  Initializing status block for node 5...
  Initializing status block for node 6...
  Initializing status block for node 7...
  Initializing status block for node 8...
  Initializing status block for node 9...
  Initializing status block for node 10...
  Initializing status block for node 11...
  Initializing status block for node 12...
  Initializing status block for node 13...
  Initializing status block for node 14...
  Initializing status block for node 15...
  Initializing status block for node 16...
[root@node1 ~]#

  

[root@node1 ~]# mkqdisk -L

  mkqdisk v3.0.12.1
  /dev/block/8:17:
  /dev/disk/by-id/scsi-1IET_00010001-part1:
  /dev/disk/by-path/ip-192.168.10.150:3260-iscsi-iqn.2016-08.disk.rh6:disk100m-lun-1-part1:
  /dev/sdb1:
  Magic:                eb7a62c2
  Label:                testqdisk
  Created:              Mon May 22 22:52:01 2017
  Host:                 node1.localdomain
  Kernel Sector Size:   512
  Recorded Sector Size: 512
  

[root@node1 ~]#

  

  15、在节点二查看仲裁盘,也正常识别
[root@node2 ~]# partprobe /dev/sdb1

[root@node2 ~]# mkqdisk -L

  mkqdisk v3.0.12.1
  /dev/block/8:17:
  /dev/disk/by-id/scsi-1IET_00010001-part1:
  /dev/disk/by-path/ip-192.168.10.150:3260-iscsi-iqn.2016-08.disk.rh6:disk100m-lun-1-part1:
  /dev/sdb1:
  Magic:                eb7a62c2
  Label:                testqdisk
  Created:              Mon May 22 22:52:01 2017
  Host:                 node1.localdomain
  Kernel Sector Size:   512
  Recorded Sector Size: 512
  

  16、配置群集使用该仲裁盘
DSC00023.png

  

  17、重启群集,使仲裁盘生效
[root@node1 ~]# ccs -h node1 --stopall

  node1 password:
  Stopped node2.localdomain
  Stopped node1.localdomain
  

[root@node1 ~]# ccs -h node1 --startall

  Started node2.localdomain
  Started node1.localdomain
[root@node1 ~]#

  

  18、查看群集状态
[root@node1 ~]# clustat

  Cluster Status for TestCluster2 @ Mon May 22 23:48:27 2017
  Member Status: Quorate
  

  Member Name                             ID  Status
  ------ ----                             ---- ------
  node1.localdomain                          1 Online, Local, rgmanager
  node2.localdomain                          2 Online, rgmanager
  /dev/block/8:17                           0 Online, Quorum Disk
  

  Service Name                   Owner (Last)                 State
  ------- ----                   ----- ------                 -----
  service:TestServGrp              node1.localdomain             started
[root@node1 ~]#

  

  19、查看群集节点状态
[root@node1 ~]# ccs_tool lsnode

  Cluster name: icpl_cluster, config_version: 21
  Nodename                        Votes  Nodeid   Fencetype
  node1.localdomain                    1    1    vcenter_fence
  node2.localdomain                    1    2    vcenter_fence
  

  20、查看群集节点同步状态
[root@node1 ~]# ccs -h node1 --checkconf

  All nodes in sync.
  

  21、使用群集IP访问web服务
DSC00024.png

  江健龙的技术博客http://jiangjianlong.blog.51cto.com/3735273/1931499
  

  四、群集故障转移测试
  1、关闭主节点,故障自动转移功能正常
[root@node1 ~]#poweroff
[root@node1 ~]#tail–f /var/log/messages
May 23 10:29:26 node1 modclusterd: shutdown succeeded
May 23 10:29:26 node1 rgmanager[2125]: Shutting down
May 23 10:29:26 node1 rgmanager[2125]: Shutting down
May 23 10:29:26 node1 rgmanager[2125]:Stopping service service:TestServGrp
May 23 10:29:27 node1 rgmanager[2125]: [ip] Removing IPv4 address 192.168.10.103/24 from eth0
May 23 10:29:36 node1rgmanager[2125]: Service service:TestServGrp is stopped
May 23 10:29:36 node1 rgmanager[2125]: Disconnecting from CMAN
May 23 10:29:52 node1 rgmanager[2125]: Exiting
May 23 10:29:53 node1 ricci:shutdown succeeded
May 23 10:29:54 node1 oddjobd: oddjobd shutdown succeeded
  May 23 10:29:54 node1 saslauthd[2315]:server_exit  : master exited: 2315
  

  

  [root@node2 ~]#tail–f /var/log/messages
  May 23 10:29:45 node2rgmanager[2130]: Member 1 shutting down
May 23 10:29:45 node2rgmanager[2130]: Starting stopped service service:TestServGrp
May 23 10:29:45 node2rgmanager[5688]: [ip] Adding IPv4 address 192.168.10.103/24 to eth0
May 23 10:29:49 node2rgmanager[2130]: Service service:TestServGrp started
May 23 10:30:06 node2 qdiskd[1480]: Node 1 shutdown
May 23 10:30:06 node2 corosync[1437]:  [QUORUM Members[1]: 2
May 23 10:30:06 node2 corosync[1437]:  [TOTEM ] A processor joined or left the membership
and a new membership was formed.
May 23 10:30:06 node2 corosync[1437]:  [CPG  ]chosen downlist: sender r(0) ip (192.168.10.105) :
members(old:2 left:1)
May 23 10:30:06 node2 corosync[1437]:  [MAIN ] Completed service synchronization,ready to
provide service
May 23 10:30:06 node2 kernel: dlm: closing connection to node 1
  May 23 10:30:06 node2 qdiskd[1480]: Assumingmaster role
  

[root@node2 ~]# clustat
Cluster Status for TestCluster2 @ Mon May 22 23:48:27 2017
Member Status: Quorate


Member Name                             ID  Status
------ ----                             ---- ------
node1.localdomain                          1 Online, Local, rgmanager
node2.localdomain                          2 Online, rgmanager
/dev/block/8:17                           0 Online, Quorum Disk


Service Name                   Owner (Last)                 State
------- ----                   ----- ------                 -----
service:TestServGrp              node2.localdomain             started
[root@node2 ~]#
DSC00025.png

  

  2、停掉主节点应用服务,故障自动转移功能正常

  [root@node2~]# /etc/init.d/httpd stop
  [root@node2 ~]#tail–f /var/log/messages
  May 23 11:14:02 node2rgmanager[11264]: [script] Executing /etc/init.d/httpd status
  May 23 11:14:02 node2rgmanager[11289]: [script] script:icpl: status of /etc/init.d/httpd failed(returned 3)
  May 23 11:14:02 node2rgmanager[2127]: status on script "httpd" returned 1 (generic error)
  May 23 11:14:02 node2rgmanager[2127]: Stopping service service:TestServGrp
  May 23 11:14:03 node2rgmanager[11320]: [script] Executing /etc/init.d/httpd stop
  May 23 11:14:03 node2rgmanager[11384]: [ip] Removing IPv4 address 192.168.10.103/24 from eth0
  May 23 11:14:08 node2 ricci[11416]:Executing '/usr/bin/virsh nodeinfo'
  May 23 11:14:08 node2 ricci[11418]:Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/2116732044'
  May 23 11:14:09 node2 ricci[11422]:Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1193918332'
  May 23 11:14:13 node2rgmanager[2127]: Service service:TestServGrp is recovering
  May23 11:14:17 node2 rgmanager[2127]: Service service:TestServGrp is now running onmember 1
  

[root@node1 ~]#tail–f /var/log/messages

  
  May 23 11:14:20 node1rgmanager[2130]: Recovering failed service service:TestServGrp
  May 23 11:14:20 node1rgmanager[13006]: [ip] Adding IPv4 address 192.168.10.103/24 to eth0
  May 23 11:14:24 node1rgmanager[13092]: [script] Executing /etc/init.d/httpd start
  May 23 11:14:24 node1rgmanager[2130]: Service service:TestServGrp started
  May 23 11:14:58 node1rgmanager[13280]: [script] Executing /etc/init.d/httpd status
  

  

[root@node1 ~]# clustat
Cluster Status for TestCluster2 @ Mon May 22 23:48:27 2017
Member Status: Quorate


Member Name                             ID  Status
------ ----                             ---- ------
node1.localdomain                          1 Online, Local, rgmanager
node2.localdomain                          2 Online, rgmanager
/dev/block/8:17                           0 Online, Quorum Disk


Service Name                   Owner (Last)                 State
------- ----                   ----- ------                 -----
service:TestServGrp              node1.localdomain             started
[root@node1 ~]#

  

  3、停掉主节点网络服务,故障自动转移功能正常
[root@node1 ~]#service network stop
  

[root@node2 ~]#tail–f /var/log/messages
May 23 22:11:16 node2 qdiskd[1480]:Assuming master role
May 23 22:11:17 node2 qdiskd[1480]:Writing eviction notice for node 1
May 23 22:11:17 node2corosync[1437]:   [TOTEM ] A processorfailed, forming new configuration.
May 23 22:11:18 node2 qdiskd[1480]:Node 1 evicted
May 23 22:11:19 node2corosync[1437]:   [QUORUM] Members[1]: 2
May 23 22:11:19 node2corosync[1437]:   [TOTEM ] A processorjoined or left the membership and a new membership was formed.
May 23 22:11:19 node2corosync[1437]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.10.105); members(old:2 left:1)
May 23 22:11:19 node2corosync[1437]:   [MAIN  ] Completed service synchronization, ready toprovide service.
May23 22:11:19 node2 kernel: dlm: closing connection to node 1
May23 22:11:19 node2 rgmanager[2131]: State change: node1.localdomain DOWN
May23 22:11:19 node2 fenced[1652]: fencing node1.localdomain
May23 22:11:58 node2 fenced[1652]: fence node1.localdomain success
May23 22:11:59 node2 rgmanager[2131]: Taking over service service:TestServGrp from downmember node1.localdomain
May23 22:11:59 node2 rgmanager[6145]: [ip] Adding IPv4 address 192.168.10.103/24 toeth0
May23 22:12:03 node2 rgmanager[6234]: [script] Executing /etc/init.d/httpd start
May23 22:12:03 node2 rgmanager[2131]: Service service:TestServGrp started
May 23 22:12:35 node2corosync[1437]:   [TOTEM ] A processorjoined or left the membership and a new membership was formed.
May 23 22:12:35 node2corosync[1437]:   [QUORUM] Members[2]: 12
May 23 22:12:35 node2corosync[1437]:   [QUORUM] Members[2]: 12
May 23 22:12:35 node2 corosync[1437]:   [CPG  ] chosen downlist: sender r(0) ip(192.168.10.105) ; members(old:1 left:0)
May 23 22:12:35 node2corosync[1437]:   [MAIN  ] Completed service synchronization, ready toprovide service.
May 23 22:12:41 node2rgmanager[6425]: [script] Executing /etc/init.d/httpd status
May 23 22:12:43 node2 qdiskd[1480]:Node 1 shutdown
May 23 22:12:55 node2 kernel: dlm:got connection from 1
  May 23 22:13:08 node2 rgmanager[2131]: Statechange: node1.localdomain UP
  

[root@node2 ~]# clustat
Cluster Status for TestCluster2 @ Mon May 22 23:48:27 2017
Member Status: Quorate


Member Name                             ID  Status
------ ----                             ---- ------
node1.localdomain                          1 Online, Local, rgmanager
node2.localdomain                          2 Online, rgmanager
/dev/block/8:17                           0 Online, Quorum Disk


Service Name                   Owner (Last)                 State
------- ----                   ----- ------                 -----
service:TestServGrp              node2.localdomain             started
[root@node2 ~]#

  附:RHCS名词解释
1 分布式集群管理器(CMAN,Cluster Manager
管理集群成员,了解成员之间的运行状态。

2 分布式锁管理器(DLM,Distributed Lock Manager
每一个节点都运行了一个后台进程DLM,当用记操作一个元数据时,会通知其它节点,只能读取这个元数据。

3 配置文件管理(CCS,Cluster Configuration System)
主要用于集群配置文件管理,用于配置文件的同步。每个节点运行了CSS后台进程。当发现配置文件(/etc/cluster/cluster.conf)变化后,马上将此变化传播到其它节点上去。

4.fence设备(fence)
工作原理:当主机异常,务机会调用fence设备,然后将异常主机重启,当fence设备操作成功后,返回信息给备机,备机接到fence设备的消息后,接管主机的服务和资源。

5、Conga集群管理软件:
Conga由两部分组成:luci和ricci,luci是跑在集群管理服务器上的服务,而ricci则是跑在各集群节点上的服务,luci也可以装在节点上。集群的管理和配置由这两个服务进行通信,可以使用Conga的web界面来管理RHCS集群。
  

  6、高可用性服务管理(rgmanager)
  提供节点服务监控和服务故障转移功能,当一个节点服务出现故障时,将服务转移到另一个健康节点。

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-457685-1-1.html 上篇帖子: redhat 7 破解root密码和grub2加密 下篇帖子: RedHat 7.3 使用命令创建yum源
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表