OpenStack HA集群3

ts2009 发表于 2018-5-31 08:17:29

　　节点间主机名必须能解析
　　# cat /etc/hosts
　　192.168.17.149controller1
　　192.168.17.141controller2
　　192.168.17.166controller3
　　192.168.17.111demo.open-stack.cn
　　各节点间要互信，无密码能登录
　　# ssh-keygen -t rsa
　　Generating public/private rsa key pair.
　　Enter file in which to save the key (/root/.ssh/id_rsa):
　　Enter passphrase (empty for no passphrase):
　　Enter same passphrase again:
　　Your identification has been saved in /root/.ssh/id_rsa.
　　Your public key has been saved in /root/.ssh/id_rsa.pub.
　　The key fingerprint is:
　　20:79:d4:a4:9f:8b:75:cf:12:58:f4:47:a4:c1:29:f3 root@controller1
　　The key's randomart image is:
　　+--[ RSA 2048]----+
　　|    .o. ...oo|
　　| o ...o.o+ |
　　| o + .+o .|
　　| o o +E. |
　　|    S o    |
　　|    o o + |
　　|    . . . o |
　　|       . |
　　|             |
　　+-----------------+
　　# ssh-copy-id controller2
　　# ssh-copy-id controller3
　　配置YUM源
　　# vim /etc/yum.repos.d/ha-clustering.repo
　　
　　name=Stable High Availability/Clustering packages (CentOS-7)
　　type=rpm-md
　　baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/
　　gpgcheck=0
　　gpgkey=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/repodata/repomd.xml.key
　　enabled=1
　　这个YUM源可能会冲突，先enabled=0，如果剩下一个crmsh包，再enabled=1打开后安装
　　Corosync下载地址，目前最新版本2.4.2
　　http://build.clusterlabs.org/corosync/releases/
　　http://build.clusterlabs.org/corosync/releases/corosync-2.4.2.tar.gz
　　# ansible controller -m copy -a "src=/etc/yum.repos.d/ha-cluster.repo dest=/etc/yum.repos.d/"
　　安装软件包
　　# yum install pacemaker pcs resource-agents -y cifs-utils quota psmisc corosync fence-agents-all lvm2 resource-agents
　　#yum install crmsh-y
　　启动pcsd，并确认启动正常
　　# systemctl enable pcsd
　　# systemctl enable corosync
　　# systemctl start pcsd
　　# systemctl status pcsd
　　# pacemakerd -$
　　Pacemaker 1.1.15-11.el7_3.2
　　Written by Andrew Beekhof
　　# ansible controller -m command -a "pacemakerd -$"
　　修改hacluster密码
　　【all】# echo zoomtech | passwd --stdin hacluster
　　# ansible controller -m command -a "echo zoomtech | passwd --stdin hacluster"
　　# passwd hacluster
　　编辑corosync.conf
　　# vim /etc/corosync/corosync.conf
　　totem {
　　version: 2
　　secauth: off
　　cluster_name: openstack-cluster
　　transport: udpu
　　}
　　nodelist {
　　node {
　　ring0_addr: controller1
　　nodeid: 1
　　}
　　node {
　　ring0_addr: controller2
　　nodeid: 2
　　}
　　node {
　　ring0_addr: controller3
　　nodeid: 3
　　}
　　}
　　logging {
　　to_logfile: yes
　　logfile: /var/log/cluster/corosync.log
　　to_syslog: yes
　　}
　　quorum {
　　provider: corosync_votequorum
　　}
　　# scp /etc/corosync/corosync.conf controller2:/etc/corosync/
　　# scp /etc/corosync/corosync.conf controller3:/etc/corosync/
　　# ansible controller -m copy -a "src=corosync.conf dest=/etc/corosync"
　　创建集群
　　使用pcs设置集群身份认证
　　# pcs cluster auth controller1 controller2 controller3 -u hacluster -p zoomtech --force
　　controller3: Authorized
　　controller2: Authorized
　　controller1: Authorized
　　现在我们创建一个集群并添加一些节点。注意,这个名字不能超过15个字符
　　# pcs cluster setup --force --name openstack-cluster controller1 controller2 controller3
　　Destroying cluster on nodes: controller1, controller2, controller3...
　　controller3: Stopping Cluster (pacemaker)...
　　controller2: Stopping Cluster (pacemaker)...
　　controller1: Stopping Cluster (pacemaker)...
　　controller2: Successfully destroyed cluster
　　controller1: Successfully destroyed cluster
　　controller3: Successfully destroyed cluster
　　Sending cluster config files to the nodes...
　　controller1: Succeeded
　　controller2: Succeeded
　　controller3: Succeeded
　　Synchronizing pcsd certificates on nodes controller1, controller2, controller3...
　　controller3: Success
　　controller2: Success
　　controller1: Success
　　Restarting pcsd on the nodes in order to reload the certificates...
　　controller3: Success
　　controller2: Success
　　controller1: Success
　　启动集群
　　# pcs cluster enable --all
　　controller1: Cluster Enabled
　　controller2: Cluster Enabled
　　controller3: Cluster Enabled
　　# pcs cluster start --all
　　controller2: Starting Cluster...
　　controller1: Starting Cluster...
　　controller3: Starting Cluster...
　　查看集群状态
　　# ansible controller -m command -a "pcs cluster status"
　　# pcs cluster status
　　Cluster Status:
　　Stack: corosync
　　Current DC: controller3 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum
　　Last updated: Fri Feb 17 10:39:38 2017    Last change: Fri Feb 17 10:39:29 2017 by hacluster via crmd on controller3
　　3 nodes and 0 resources configured
　　PCSD Status:
　　controller2: Online
　　controller3: Online
　　controller1: Online
　　# ansible controller -m command -a "pcs status"
　　# pcs status
　　Cluster name: openstack-cluster
　　Stack: corosync
　　Current DC: controller2 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum
　　Last updated: Thu Mar2 17:07:34 2017    Last change: Thu Mar2 01:44:44 2017 by root via cibadmin on controller1
　　3 nodes and 1 resource configured
　　Online: [ controller1 controller2 controller3 ]
　　Full list of resources:
　　vip (ocf::heartbeat:IPaddr2): Started controller2
　　Daemon Status:
　　corosync: active/enabled
　　pacemaker: active/enabled
　　pcsd: active/enabled
　　查看集群状态
　　# ansible controller -m command -a "crm_mon -1"
　　# crm_mon -1
　　Stack: corosync
　　Current DC: controller2 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum
　　Last updated: Wed Mar1 17:54:04 2017       Last change: Wed Mar1 17:44:38 2017 by root via cibadmin on controller1
　　3 nodes and 1 resource configured
　　Online: [ controller1 controller2 controller3 ]
　　Active resources:
　　vip (ocf::heartbeat:IPaddr2): Started controller1
　　查看pacemaker进程状态
　　# ps aux | grep pacemaker
　　root    759000.20.5 1326329216 ?    Ss 10:39 0:00 /usr/sbin/pacemaked -f
　　haclust+759010.30.8 135268 15376 ?    Ss 10:39 0:00 /usr/libexec/pacemaker/cib
　　root    759020.10.4 1356087920 ?    Ss 10:39 0:00 /usr/libexec/pacemaker/stonithd
　　root    759030.00.2 1050925020 ?    Ss 10:39 0:00 /usr/libexec/pacemaker/lrmd
　　haclust+759040.00.4 1269247636 ?    Ss 10:39 0:00 /usr/libexec/pacemaker/attrd
　　haclust+759050.00.2 1170404560 ?    Ss 10:39 0:00 /usr/libexec/pacemaker/pengine
　　haclust+759060.10.5 1453288988 ?    Ss 10:39 0:00 /usr/libexec/pacemaker/crmd
　　root    759970.00.0 112648 948 pts/0 R+ 10:40 0:00 grep --color=auto pacemaker
　　查看集群状态
　　# corosync-cfgtool -s
　　Printing ring status.
　　Local node ID 1
　　RING ID 0
　　id = 192.168.17.132
　　status = ring 0 active with no faults
　　# corosync-cfgtool -s
　　Printing ring status.
　　Local node ID 2
　　RING ID 0
　　id = 192.168.17.146
　　status = ring 0 active with no faults
　　# corosync-cfgtool -s
　　Printing ring status.
　　Local node ID 3
　　RING ID 0
　　id = 192.168.17.138
　　status = ring 0 active with no faults
　　# corosync-cmapctl | grep members
　　runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
　　runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.17.132)
　　runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
　　runtime.totem.pg.mrp.srp.members.1.status (str) = joined
　　runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
　　runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.17.146)
　　runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
　　runtime.totem.pg.mrp.srp.members.2.status (str) = joined
　　runtime.totem.pg.mrp.srp.members.3.config_version (u64) = 0
　　runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(192.168.17.138)
　　runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 1
　　runtime.totem.pg.mrp.srp.members.3.status (str) = joined
　　查看集群状态
　　# pcs status corosync
　　Membership information
　　----------------------
　　Nodeid    Votes Name
　　1       1 controller1 (local)
　　3       1 controller3
　　2       1 controller2
　　# pcs status corosync
　　Membership information
　　----------------------
　　Nodeid    Votes Name
　　1       1 controller1
　　3       1 controller3
　　2       1 controller2 (local)
　　# pcs status corosync
　　Membership information
　　----------------------
　　Nodeid    Votes Name
　　1       1 controller1
　　3       1 controller3 (local)
　　2       1 controller2
　　# crm_verify -L -V
　　error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
　　error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
　　error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
　　Errors found during check: config not valid
　　#
　　# pcs property set stonith-enabled=false
　　# pcs property set no-quorum-policy=ignore
　　# crm_verify -L -V
　　# ansible controller -m command -a "pcs property set stonith-enabled=false
　　# ansible controller -m command -a "pcs property set no-quorum-policy=ignore"
　　# ansible controller -m command -a "crm_verify -L -V"
　　配置 VIP
　　# crm
　　crm(live)# configure
　　crm(live)configure# show
　　node 1: controller1
　　node 2: controller2
　　node 3: controller3
　　property cib-bootstrap-options: \
　　have-watchdog=false \
　　dc-version=1.1.15-11.el7_3.2-e174ec8 \
　　cluster-infrastructure=corosync \
　　cluster-name=openstack-cluster \
　　stonith-enabled=false \
　　no-quorum-policy=ignore
　　crm(live)configure# primitive vip ocf:heartbeat:IPaddr2 params ip=192.168.17.111 cidr_netmask=24 nic=ens37 op start interval=0s timeout=20s op stop interval=0s timeout=20s monitor interval=30s meta priority=100
　　crm(live)configure# show
　　node 1: controller1
　　node 2: controller2
　　node 3: controller3
　　primitive vip IPaddr2 \
　　params ip=192.168.17.111 cidr_netmask=24 nic=ens37 \
　　op start interval=0s timeout=20s \
　　op stop interval=30s timeout=20s monitor \
　　meta priority=100
　　property cib-bootstrap-options: \
　　have-watchdog=false \
　　dc-version=1.1.15-11.el7_3.2-e174ec8 \
　　cluster-infrastructure=corosync \
　　cluster-name=openstack-cluster \
　　stonith-enabled=false \
　　no-quorum-policy=ignore
　　crm(live)configure# commit
　　crm(live)configure# exit
　　查看VIP已绑定在ens37网卡上
　　# ip a
　　4: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
　　link/ether 00:0c:29:ff:8b:4b brd ff:ff:ff:ff:ff:ff
　　inet 192.168.17.141/24 brd 192.168.17.255 scope global dynamic ens37
　　valid_lft 2388741sec preferred_lft 2388741sec
　　inet 192.168.17.111/24 brd 192.168.17.255 scope global secondary ens37
　　valid_lft forever preferred_lft forever
　　上面指定的网卡名称3个节点必须是同一个名称，否则飘移会出现问题，切换不过去
　　# crm status
　　Stack: corosync
　　Current DC: controller1 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum
　　Last updated: Wed Feb 22 11:42:07 2017    Last change: Wed Feb 22 11:22:56 2017 by root via cibadmin on controller1
　　

　　3 nodes and 1 resource configured
　　

　　Online: [ controller1 controller2 controller3 ]
　　

　　Full list of resources:
　　

　　 vip (ocf::heartbeat:IPaddr2): Started controller1
　　查看corosync引擎是否正常启动
　　# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
　　 controller1 corosyncnotice Corosync Cluster Engine ('2.4.0'): started and ready to provide service.
　　Mar 01 17:35:20 controller1    cib: info: retrieveCib: Reading cluster configuration file /var/lib/pacemaker/cib/cib.xml (digest: /var/lib/pacemaker/cib/cib.xml.sig)
　　Mar 01 17:35:20 controller1    cib:warning: cib_file_read_and_verify: Could not verify cluster configuration file /var/lib/pacemaker/cib/cib.xml: No such file or directory (2)
　　Mar 01 17:35:20 controller1    cib:warning: cib_file_read_and_verify: Could not verify cluster configuration file /var/lib/pacemaker/cib/cib.xml: No such file or directory (2)
　　Mar 01 17:35:20 controller1    cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.Apziws (digest: /var/lib/pacemaker/cib/cib.0ZxsVW)
　　Mar 01 17:35:21 controller1    cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.ObYehI (digest: /var/lib/pacemaker/cib/cib.O8Rntg)
　　Mar 01 17:35:42 controller1    cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.eqrhsF (digest: /var/lib/pacemaker/cib/cib.6BCfNj)
　　Mar 01 17:35:42 controller1    cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.riot2E (digest: /var/lib/pacemaker/cib/cib.SAqtzj)
　　Mar 01 17:35:42 controller1    cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.Q8H9BL (digest: /var/lib/pacemaker/cib/cib.MBljlq)
　　Mar 01 17:38:29 controller1    cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.OTIiU4 (digest: /var/lib/pacemaker/cib/cib.JnHr1v)
　　Mar 01 17:38:36 controller1    cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.2cK9Yk (digest: /var/lib/pacemaker/cib/cib.JSqEH8)
　　Mar 01 17:44:38 controller1    cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.aPFtr3 (digest: /var/lib/pacemaker/cib/cib.E3Ve7X)
　　#
　　查看初始化成员节点通知是否正常发出
　　# grepTOTEM /var/log/cluster/corosync.log
　　 controller1 corosyncnotice Initializing transport (UDP/IP Unicast).
　　 controller1 corosyncnotice Initializing transmit/receive security (NSS) crypto: none hash: none
　　 controller1 corosyncnotice The network interface is now up.
　　 controller1 corosyncnotice adding new UDPU member {192.168.17.149}
　　 controller1 corosyncnotice adding new UDPU member {192.168.17.141}
　　 controller1 corosyncnotice adding new UDPU member {192.168.17.166}
　　 controller1 corosyncnotice A new membership (192.168.17.149:4) was formed. Members joined: 1
　　 controller1 corosyncnotice A new membership (192.168.17.141:12) was formed. Members joined: 2 3
　　检查启动过程中是否有错误产生
　　# grep ERROR: /var/log/cluster/corosync.log
　　

页: [1]

运维网's Archiver

OpenStack HA集群3