corosync+pacemaker实现高可用集群

iutyhrg 发表于 2016-5-31 09:53:07

Corosync

   corosync最初只是用来演示OpenAIS集群框架接口规范的一个应用，可以实现HA心跳信息传输的功能，是众多实现HA集群软件中之一，可以说corosync是OpenAIS的一部分，然而后面的发展超越了官方最初的设想，越来越多的厂商尝试使用corosync作为集群解决方案，如Redhat的RHCS集群套件就是基于corosync实现。
   corosync只提供了message layer(即实现HeartBeat + CCM)，而没有直接提供CRM，一般使用Pacemaker进行资源管理。
Pacemaker
   pacemaker是一个开源的高可用资源管理器(CRM)，位于HA集群架构中资源管理、资源代理(RA)这个层次，它不能提供底层心跳信息传递的功能，要想与对方节点通信需要借助底层的心跳传递服务，将信息通告给对方。

corosync和pacemaker的架构图

Corosync主要就是实现集群中Message layer层的功能：完成集群心跳及事务信息传递
Pacemaker主要实现的是管理集群中的资源（CRM）,真正启用、停止集群中的服务是RA（资源代理）这个子组件。RA的类别有下面几种类型：
LSB：位于/etc/rc.d/init.d/*,至少支持start,stop,restart,status,reload,force-reload;
               注意：不能开机自动运行；要有CRM来启动                   //centos6用这种类型控制
   OCF: /usr/lib/ocf/resource.d/provider/,类似于LSB脚本，但支持start,stop,status,monitor,meta-data;
   STONITH：调用stonith设备的功能
   systemd：unit file，/usr/lib/systemd/system/
               注意：服务必须设置enable，开启自启；                                        //centos7支持
   service：调用用户的自定义脚本

实验环境：
   虚拟机IP： 172.18.250.77 node1.magedu.com CentOS7
   虚拟机IP： 172.18.250.78 node2.magedu.com CentOS7

一、安装corosync和pacemaker
安装之前先要确定节点上时间是否同步、防火墙和selinux是否会称为阻碍、各节点之间是否能通过主机名通信、各节点是否能通过主机密钥通信

1
2
3
4
5
6
7
8
]# hostname
node1.magedu.com
]# hostname
node2.magedu.com
]# ntpdate 172.18.0.1                //找台时间服务器在两节点上时间同步
]# systemctl stop firewalld.service    //停止防火墙
]# getenforce                         //selinux确保关闭
Disabled

安装服务：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
]# yum -y install corosync pacemaker
]# rpm -ql corosync
/etc/corosync
/etc/corosync/corosync.conf.example             //corosync的配置文件
/etc/corosync/corosync.conf.example.udpu       //基于UDP的配置文件
/etc/corosync/corosync.xml.example             //基于xml的扩展标记配置文件
/usr/sbin/corosync                            //启动程序

]# vim /etc/corosync/corosync.conf
totem {
version: 2
cluster_name: mycluster       //集群名称
crypto_cipher: aes128          //对称加密算法
crypto_hash: sha1             //单向加密算法
interface {
         ringnumber: 0    //回环号码，如果一个主机有多块网卡，避免心跳信息回流                bindnetaddr: 172.18.0.0 //绑定心跳网段 corosync会自动判断本地网卡上配置的哪个IP地址是属于这个网络的，并把这个接口作为多播心跳信息传递的接口
   mcastaddr: 239.25.1.1    //心跳信息组播地址(所有节点组播地址必须为同一个)
   mcastport: 5405          // 组播时使用的端口
   ttl: 1                   //只向外一跳心跳信息，避免组播报文环路
}
}
logging {                      //日志功能，默认就行
fileline: off
to_stderr: no
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: no                            //是否启用syslog
debug: off
timestamp: on                   //是否打印时间戳，利于定位错误，但会产生大量系统调用，消耗CPU资源
logger_subsys {
         subsys: QUORUM
         debug: off
      }
}
quorum {                                     //投票系统
provider: corosync_votequorum       //支持哪种投票方式
#expected_votes: 8                //总投票数
two_nodes: 1                      //两节点特殊
}
nodelist {                               //节点列表
node {
         ring0_addr: 172.18.250.77    //节点IP
   nodeid: 1                   //节点编号
}
node {
   ring0_addr: 172.18.250.78
   nodeid: 2
}
}

创建集群之间传递心跳信息的共享密钥：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
]# corosync-keygen --help
Usage: corosync-keygen [-k <keyfile>] [-l]
-k / --key-file=<filename> -Write to the specified keyfile
         instead of the default /etc/corosync/authkey.
-l / --less-secure -Use a less secure random number source
         (/dev/urandom) that is guaranteed not to require user
         input for entropy.This can be used when this
         application is used from a script.
]# corosync-keygen-l          //生成简单的密钥
]# cd /etc/corosync/
]# ll
total 16
-r-------- 1 root root128 May 29 18:16 authkey //自动生成在/etc/corosync/目录下，确保为400

#复制密钥文件到另一节点:
]# scp -p authkey corosync.conf root@172.18.250.78:/etc/corosync/
]# systemctl start corosync.service pacemaker.service
]# ss -uan    //确保端口启动
State    Recv-Q Send-Q    Local Address:Port          Peer Address:Port
UNCONN    0    0       172.18.250.77:5404          *:*
UNCONN    0    0       172.18.250.77:5405          *:*
UNCONN    0    0       239.25.1.1:5405             *:*

#启动完成后要进行一系列的确认，看各组件工作是否正常：
grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
//另外一个节点上也执行同样的命令，来确保 Cluster Engine工作是否正常
tail /var/log/cluster/corosync.log
//查看两个corosync节点之间成员关系是否初始化，两个节点之间应该开始同步一些集群事务信息
grep"TOTEM"/var/log/cluster/corosync.log    // 查看初始化成员节点通知是否正常发出
grep ERROR /var/log/cluster/corosync.log       // 检查启动过程中是否有错误产生

]# crm_mon                         //查看集群节点
Last updated: Sun May 29 21:53:09 2016    Last change: Sun May 29 20:09:08 2016 by root via cibadmin on node1.magedu.com
Stack: corosync
Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 0 resources configured

Online: [ node1.magedu.com node2.magedu.com ]

到此corosync已正常工作，后面需要配置各种服务资源，而pacemaker只是一个资源管理器，没提供管理接口，所以实现CRM的管理接口有两类：
   CLI:命令行接口
            crmsh(SUSE)：
            pcs
   GUI:图形接口
            HB_GUI
            Conga(luci/riccl)： Web接口

   crmsh提供了一个命令行的交互接口来对Pacemaker集群进行管理，它具有更强大的管理功能，同样也更加易用，在更多的集群上都得到了广泛的应用，类似软件还有 pcs。注：在crm管理接口所做的配置会同步到各个节点上；

二、安装crmsh

1
2
3
4
]# ls
crmsh-2.1.4-1.1.x86_64.rpmpssh-2.3.1-4.2.x86_64.rpmpython-pssh-2.3.1-4.2.x86_64.rpm
]# yum -y install *.rpm
]# systemctl start corosync.server pacemaker.service

crm的特性：
   1、任何操作都需要commit提交后才会生效；
   2、想要删除一个资源之前需要先将资源停止
   3、可以用help COMMAND 获取该命令的帮助
   4、与Linux命令行一样，都支持TAB补全

crm有两种工作方式：
1、命令行模式：

1
2
3
4
5
6
7
]# crm status
Last updated: Mon May 30 10:07:02 2016    Last change: Mon May 30 10:06:50 2016 by hacluster via crmd on node1.magedu.com
Stack: corosync
Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 1 resource configured

Online: [ node1.magedu.com node2.magedu.com ]

2、交互式模式：

1
2
3
4
5
6
7
8
]# crm
crm(live)# status
Last updated: Mon May 30 10:07:08 2016    Last change: Mon May 30 10:06:50 2016 by hacluster via crmd on node1.magedu.com
Stack: corosync
Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 1 resource configured

Online: [ node1.magedu.com node2.magedu.com ]

1
2
3
4
5
6
7
8
9
10
11
crm(live)# help
cib          manage shadow CIBs       // cib管理
resource       resources management    // 管理资源的启动、停止等
configure    CRM cluster configuration // 编辑集群配置信息
node          nodes management       // 集群节点管理子命令
history       CRM cluster history
site          Geo-cluster support
ra             resource agents information center // 资源代理子命令（所有与资源代理相关的程都在此命令之下）
status       show cluster status # 显示当前集群的状态信息
template       Edit and import a configuration from a template //编辑或导入一个配置模板
script       Cluster script management //集群脚本管理

crm常用的管理命令：
1、管理资源的约束、资源的粘性及资源的类型    CIB：集群事务库，保存并传播集群配置的文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
crm(live)# configure
crm(live)configure# help
acl_target Define target access rights
_test       Help for command _test
clone       Define a clone                   //定义一个克隆
colocation Colocate resources             //定义资源的约束
commit       Commit the changes to the CIB    //保存配置到CIB
default-timeouts Set timeouts for operations to minimums from the meta-data
delete       Delete CIB objects             //删除一个CIB配置
edit       Edit CIB objects                //编辑配置文件
erase       Erase the CIB
fencing_topology Node fencing order             //隔离节点顺序
filter       Filter CIB objects             //对CIB进行过滤
graph       Generate a directed graph
group       Define a group                   //定义一个组
load       Import the CIB from a file       //从文件中导入CIB
location    A location preference
modgroup    Modify group
monitor    Add monitor operation to a primitive //监控一个资源
ms          Define a master-slave resource
node       Define a cluster node                //定义一个集群界定
op_defaults Set resource operations defaults
order       Order resources                      //顺序排列资源
primitive    Define a resource                      //定义一个资源
property    Set a cluster property                //设置集群全局配置
verify       verify the CIB with crm_verify       // CIB语法验证
show       display CIB objects                   // 显示CIB配置文件
rsc_defaults set resource defaults                // 设置资源默认属性（粘性）
location    a location preference                // 定义位置约束优先级（默认运行于那一个节点（如果位置约束的值相同，默认倾向性那一个高，就在那一个节点上运行））
order       order resources                   // 资源的启动的先后顺序

定义资源的约束，默认为0

1、location：位置约束，描述对资源节点的倾向性
2、colocation：资源彼此间是否“在一起”的倾向性，运行在同一节点
3、order：资源启动/关闭的资源的倾向性

资源的类型：
primitive：基本资源，只能运行于一个节点
group：组资源，将组成一个HA Service所需要的所有资源组织在一起
clone：克隆，同一资源可以出现多分副本，可以运行多个节点
multi-state(master/slave)：是克隆型资源的特殊表示，副本间存在主从关系

2、管理资源的状态：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
crm(live)resource# help
cleanup    Cleanup resource status       //清理资源的状态
emote       Demote a master-slave resource //对一个资源降级操作
failcount    Manage failcounts             //管理资源的错误次数
help       Show help (help topics for list of topics)
ls          List levels and commands       //列出等级和命令
maintenance Enable/disable per-resource maintenance mode
manage       Put a resource into managed mode
meta       Manage a meta attribute       //管理资源的属性
migrate    Migrate a resource to another node //强制对资源进行迁移
param       Manage a parameter of a resource    //管理资源的参数
promote    Promote a master-slave resource//对一个资源升级操作
quit       Exit the interactive shell    //退出
refresh    Refresh CIB from the LRM status
reprobe    Probe for resources not started by the CRM
restart    Restart a resource             //重启资源
scores       Display resource scores
secret       Manage sensitive parameters
start       Start a resource
status       Show status of resources       //显示资源的状态
stop       Stop a resource
trace       Start RA tracing             //开启RA跟踪
unmanage    Put a resource into unmanaged mode
unmigrate    Unmigrate a resource to another node
untrace    Stop RA tracing
up          Go back to previous level
utilization Manage a utilization attribute

3、管理节点：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
crm(live)# node
crm(live)node# help
attribute    Manage attributes
cd          Navigate the level structure
clearstate Clear node state             //清除节点状态
delete       Delete node                   //删除节点
fence       Fence node                   //隔离节点
help       Show help (help topics for list of topics)
ls          List levels and commands
maintenance Put node into maintenance mode
online       Set node online             //节点上线
quit       Exit the interactive shell
ready       Put node into ready mode
show       Show node                   //显示节点的信息
standby    Put node into standby       //节点下线
status       Show nodes' status as XML    //已XML格式显示节点信息
status-attr Manage status attributes
up          Go back to previous level
utilization Manage utilization attributes

4、RA资源代理：实现服务的真正启动、停止等操作

1
2
3
4
5
6
7
8
9
10
crm(live)ra# help
cd          Navigate the level structure
classes    List classes and providers       // 为资源代理分类
help       Show help (help topics for list of topics)
info       Show meta data for a RA          //显示资源的属性
list       List RA for a class (and provider)//列出RA可管理的服务
ls          List levels and commands
providers    Show providers for a RA and a class
quit       Exit the interactive shell
up          Go back to previous level

示例：配置httpd的高可用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip=172.18.250.79 op monitor interval=20 timeout=20
crm(live)configure# primitive webserver systemd:httpd op monitor interval=20 timeout=20
crm(live)configure# verify
crm(live)configure# commit
crm(live)# status
Last updated: Mon May 30 10:49:44 2016    Last change: Mon May 30 10:49:27 2016 by root via cibadmin on node1.magedu.com
Stack: corosync
Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ node1.magedu.com node2.magedu.com ]

vip (ocf::heartbeat:IPaddr): Started node1.magedu.com
webserver (systemd:httpd): Started node2.magedu.com

//定义的webserver会自动分配到node2节点上，corosync会自动实现资源均衡分配

crm(live)configure# group webservice vip webserver //定义组约束
crm(live)configure# verify
crm(live)configure# commit
crm(live)# status                                  //资源都约束在了node1节点
Last updated: Mon May 30 10:52:53 2016    Last change: Mon May 30 10:52:31 2016 by root via cibadmin on node1.magedu.com
Stack: corosync
Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ node1.magedu.com node2.magedu.com ]

Resource Group: webservice
vip (ocf::heartbeat:IPaddr): Started node1.magedu.com
webserver (systemd:httpd): Started node1.magedu.com

测试是否实现httpd的高可用：

手动停止node1节点：

1
]# crm node standby             //在250.77上手动下线node1

资源发生了迁移，并实现了httpd的高可用：

1
2
3
4
5
6
7
8
9
10
11
12
scrm(live)# status
Last updated: Mon May 30 10:59:04 2016    Last change: Mon May 30 10:57:45 2016 by root via crm_attribute on node1.magedu.com
Stack: corosync
Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Node node1.magedu.com: standby
Online: [ node2.magedu.com ]

Resource Group: webservice
vip (ocf::heartbeat:IPaddr): Started node2.magedu.com
webserver (systemd:httpd): Started node2.magedu.com

示例：配置一个基于NFS的高可用的mariadb服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
]# systemctl enable mariadb.service          //两个节点上开启mariadb自启
Created symlink from /etc/systemd/system/multi-user.target.wants/mariadb.service to /usr/lib/systemd/system/mariadb.service.
crm(live)configure# primitive webnfs ocf:heartbeat:Filesystem params device="172.18.250.76:/data" directory="/mydata/data" fstype="nfs" op monitor interval=20 timeout=20
   //定义nfs资源
crm(live)configure# primitive webmysql systemd:mariadb op monitor interval=20 timeout=20
   //定义mysql资源
crm(live)configure# group webservice vip webnfs webmysql //定义组约束
crm(live)configure# verify
crm(live)configure# commit

]# vim /etc/my.cnf

datadir=/mydata/data             //修改数据库存储的目录
]# chown mysql:mysql /mydata/data

测试节点下线mysql能否转移：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
crm(live)# status          //现在是在Node1上
Last updated: Mon May 30 13:05:46 2016    Last change: Mon May 30 12:54:24 2016 by root via crm_attribute on node1.magedu.com
Stack: corosync
Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 3 resources configured

Online: [ node1.magedu.com node2.magedu.com ]

Resource Group: webservice
vip (ocf::heartbeat:IPaddr): Started node1.magedu.com
webnfs (ocf::heartbeat:Filesystem): Started node1.magedu.com
webmysql(systemd:mariadb):Started node1.magedu.com

]# crm node standby    //node1执行standby
crm(live)# status    //资源转移到了node2上
Last updated: Mon May 30 13:07:00 2016    Last change: Mon May 30 13:06:35 2016 by root via crm_attribute on node1.magedu.com
Stack: corosync
Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 3 resources configured

Node node1.magedu.com: standby
Online: [ node2.magedu.com ]

Resource Group: webservice
vip (ocf::heartbeat:IPaddr): Started node2.magedu.com
webnfs (ocf::heartbeat:Filesystem): Started node2.magedu.com
webmysql(systemd:mariadb):Started node2.magedu.com

注意：corosync自带了投票系统的，当集群发生网络分区时，拥有资源的node服务器不能向其它节点发送心跳信息时，左右不在协调，这时需要quorum机制来实现资源的重新分配。
quorum表示法定票数，只要发生网络分区，投票系统根据己方所用有的节点数是否大于节点总数除以2，只要大于，那就是拥有法定票数的一方，那就会把资源转移过来并选出一个DC。没有法定票数的一方就执行stop(停止所有资源，默认)、ignore（忽略，继续提供工作），suiclde（自杀），freeze（冻结）。

1
2
3
crm(live)configure# property no-quorum-policy=          //默认为stop
no-quorum-policy (enum, ): What to do when the cluster does not have quorum
What to do when the cluster does not have quorumAllowed values: stop, freeze, ignore, suicide

资源的监控：
primitive <rsc> {[<class>:[<provider>:]]<type>|@<template>}

...]

attr_list :: [$id=<id>] [<score>:]
         <attr>=<val> [<attr>=<val>...]] | $id-ref=<id>
id_spec :: $id=<id> | $id-ref=<id>
op_type :: start | stop | monitor
定义资源的监控时，需要查看默认时间，不能低于默认时间：

1
2
3
4
5
6
7
8
9
10
11
12
crm(live)ra# info systemd:mariadb
systemd unit file for mariadb (systemd:mariadb)

MariaDB database server

Operations' defaults (advisory minimum):

start       timeout=15
stop       timeout=15
status    timeout=15
restart    timeout=15
monitor    timeout=15 interval=15 start-delay=15

资源的约束性：
1、定义约束组
2、location和colocation
location <id> rsc {node_pref|rules}
colocation <id> <score>: <rsc>[:<role>] <with-rsc>[:<role>]

1
2
3
4
5
6
crm(live)configure# location webip_pre_node1 vip inf: node1.magedu.com
      //定义位置约束 inf:表示无穷大
crm(live)configure# colocation webip_webnfs_webmysql inf: vip webnfs webmysql
      //定义资源“在一起”约束
crm(live)configure# order webip_before_webnfs_webmysql mandatory: vip webnfs webmysql
      //定义资源的启动顺序，mandatory表示强制

三、安装ldirectord，实现对ldirectord的高可用
   ldirectors集成了lvs的负载均衡能力，也具有对后端主机进行健康检测的功能

1
2
3
4
]# ls
ldirectord-3.9.6-0rc1.1.1.x86_64.rpm
]# yum -y install *.rpm
]# systemctl enable ldirectord.service

编辑配置文件：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
]# vim /etc/ha.d/ldirectord.cf
# Global Directives
checktimeout=3             //检测超时时长
checkinterval=1             //检测扫描时间
#fallback=127.0.0.1:80       //后端主机全失效时，提供后端主机的服务
#fallback6=[::1]:80
autoreload=yes             //配置文件发生改变后自动装载
logfile="/var/log/ldirectord.log" //日志
#logfile="local0"                //日志格式
#emailalert="admin@x.y.z"       //邮件服务
#emailalertfreq=3600
#emailalertstatus=all
quiescent=no                      //是否允许静默模式

# Sample for an http virtual service
virtual=172.18.250.80:80          //定义vip
   real=172.18.250.76:80 gate //后端主机信息，算法是DR（gateway）
   real=172.18.250.75:80 gate //后端主机信息，算法是DR（gateway）
   fallback=127.0.0.1:80 gate
   service=http             //后端主机提供服务的类型
   scheduler=rr             //调度算法，轮询
   #persistent=600          //是否保持连接
   #netmask=255.255.255.255
   protocol=tcp             //支持的协议
   checktype=negotiate       //检测方式
   checkport=80             //检测端口
   request="index.html"       //请求的页面
   receive="Test Page"       //请求到的内容，才算后端主机正常

在corosync上定义ldiretord资源

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
crm(live)configure# primitive vip ocf:heartbeat:IPaddr2 params ip="172.18.250.80" lvs_support="DR" op monitor interval=15 timeout=15
crm(live)configure# primitive director systemd:ldirectord op monitor interval=15 timeout=15
crm(live)configure# group dirservice vip director

crm(live)# status
Last updated: Mon May 30 14:18:50 2016    Last change: Mon May 30 14:18:43 2016 by root via cibadmin on node1.magedu.com
Stack: corosync
Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ node1.magedu.com node2.magedu.com ]

Resource Group: dirservice
vip (ocf::heartbeat:IPaddr2): Started node1.magedu.com
director(systemd:ldirectord): Started node1.magedu.com

查看ipvs:

1
2
3
4
5
6
7
]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port       Forward Weight ActiveConn InActConn
TCP172.18.250.80:80 rr
-> 172.18.250.75:80          Route 1    0       0
-> 172.18.250.76:80          Route 1    0       0

让一节点下线，看资源是否能正常转移：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
crm(live)# status             //转移成功
Last updated: Mon May 30 14:32:31 2016    Last change: Mon May 30 14:32:26 2016 by root via crm_attribute on node1.magedu.com
Stack: corosync
Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Node node1.magedu.com: standby
Online: [ node2.magedu.com ]

Resource Group: dirservice
vip (ocf::heartbeat:IPaddr2): Started node2.magedu.com
director(systemd:ldirectord): Started node2.magedu.com

]# ipvsadm -Ln                      //在node2上查看
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port       Forward Weight ActiveConn InActConn
TCP172.18.250.80:80 rr
-> 172.18.250.75:80          Route 1    0       0
-> 172.18.250.76:80          Route 1    0       0

让一台后端主机下线，能否自动判断故障并踢出服务：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
]# killall httpd             //关闭httpd
]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port       Forward Weight ActiveConn InActConn
TCP172.18.250.80:80 rr
-> 172.18.250.75:80          Route 1    0       0

]# service httpd start       //启动httpd
]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port       Forward Weight ActiveConn InActConn
TCP172.18.250.80:80 rr
-> 172.18.250.75:80          Route 1    0       0
-> 172.18.250.76:80          Route 1    0       0 //自动添加

lisimba 发表于 2019-1-20 15:07:33

谢谢分享，学习一下

页: [1]

运维网's Archiver

corosync+pacemaker实现高可用集群