搭建高可用集群drbd同步存储和apache服务

ahua671 · 发表于 2018-11-27 09:59:49

　　pacemaker介绍: pacemaker是一个集群资源管理器.它检测集群节点和恢复节点来保证集群服务的高可用性,它可以结合corosync或heartbeat来实现通信和关系管理
Pacemaker包含以下的关键特性 :
   · 监测并恢复节点和服务级别的故障
   · 支持使用STONITH来保证数据一致性。
   · clusters支持quorate(法定人数)或 resource(资源 )驱动的集群
   · 支持任何的冗余配置
   · 自动同步各个节点的配置文件
   · 统 一的 ,可脚本控制的cluster shell
Pacemaker本身由四个关键组件组成 :
   · CIB (aka. 集群信息基础)
   · CRMd (aka. 集群资源管理守护进程)
   · PEngine (aka. PE or 策略引擎 )
   · STONITHd
   ·
搭建高可用平台
station41.example.com  192.168.10.41 主节点
station42.example.com  192.168.10.42 备用节点
在 RHEL5.4里没有pacemaker和corosync套件,所以得下载
下载在线yum 仓库，直接用yum命令安装
wget -O /etc/repos.d/pacemaker.repo  http://clusterlabs.org/rpm/
epel-5/clusterlabs.repo
pacemaker.repo的内容如下
[clusterlabs]
name=High Availability/Clustering server
baseurl=http://www.clusterlabs.org/rpm/epel-5
gpgcheck=0
yum install pacemaker corosync --skip-broken -y
会发现如下错误，导致pacemaker不能安装
Missing Dependency: libesmtp.so.5 is needed by package
pacemaker-1.0.11-1.2.el5.i386 (clusterlabs)
安装libesmtp套件
wget ftp://ftp.univie.ac.at/systems/linux/fedora/epel/5/i386/
libesmtp-1.0.4-5.el5.i386.rpm
rpm -ivh libesmtp-1.0.4-5.el5.i386.rpm
yum install pacemaker corosync -y
(以上步骤在主/备节点上完成)
corosync-keygen生成authkey，用来节点间的通信,它会告诉你authkey文件
生成在/etc/corosync目录下
　　scp /etc/corosync/authkey  station42:/etc/corosync
修改配置文件corosync.conf
cd /etc/corosync
cp corosync.conf.example  corosync.conf
vi /etc/corosync/corosync.conf
# Please read the corosync.conf. 5 manual page
compatibility:whitetank
totem {
version: 2
secauth: off
threads: 0
interface {
ringnumber: 0
bindnetaddr: 192.168. 10.0(根据具体情况修改)
mcastaddr: 226. 94.1. 1
mcastport: 5405(根据具体情况修改)
}
}
logging {
fileline: off
to_stderr: yes
to_logfile: yes
to_syslog: yes
logfile: /var/log/corosync.log (/var/log/cluster/corosync.log)
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
service {
# Load the Pacemaker Cluster Resource Manager
ver: 0
name: pacemaker
use_mgmtd: yes
}
scp corosync.conf  station43:/etc/corosync/
在主/备节点上启动启动 corosync
/etc/init.d/corosync start
　　注意:
　　1.配置文件里的logfile参数如果是/var/log/cluster/corosync.log,则会启动不了，
原因就是没有建立/var/log/cluster目录
2. 启动后查看日志文件，会发现corosync[2200]:  [SERV ] Service failed to load pacemaker'.加载 pacemaker 失败. 看了好多其他文档，都没有这个错误.弄了一个上午，才发现没有安装 net-snmp 套件(它不会自动安装)
到这里平台搭建好了，来浏览一下现有配置
介绍 crm 工具
当我们键入命令crm时，会进入一个交互式shell模式，它跟 linux的shell相似, 不了解命令时使用help命令查看帮助,如configure help
以下步骤在主节点上进行
#crm configure show
node station41.example.com
node station42.example.com
property $id="cib-bootstrap-options" \
dc-version="1.1.5bdd89e69ba545404d02445be1f3d72e6a203ba2f" \
cluster-infrastructure="openai s" \
expected-quorum-votes="2"
检查配置文件
#crm_verify -L
crm_verify[2195]: 2011/08/24_ 16:57: 12 ERROR: unpack_resources: Resource
start-up disabled since no STONITH resources have been defined
crm_verify[2195]: 2011/08/24_ 16: 57:12 ERROR: unpack_resources: Ei ther
configure some or disable STONITH with the stonith-enabled option
crm_verify[2195]: 2011/08/24_ 16: 57: 12 ERROR: unpack_resources: NOTE:
Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid -V may provide more detail s
这里出现错误正常,原因是 stonith,这里禁用它,如果要配置 stonith,可参考网站
http://www.clusterlabs.org/doc/zh-CN/Pacemaker/1.1/html/Clusters_ from_Scratch/index. html
#crm (进入交互模式)
crm(live)# configure property stonith-enabled=false, 再检查配置文件就没有错误了
　　
添加一个VIP资源（虚拟 ip,不能被占用，用于外界的访问）
crm (进入交互模式)
crm(live)#configure(进入配置资源模式)
crm(live)configure#primitive clusterip ocf:heartbeat:IPaddr2 \
params ip=192.168.10.2 cidr_ netmask=32 nic=eth0:0 \
op monitor interval=30s(定义资源,名称是clusterip,ocf:heartbeat:Ipaddr2
调用脚本/usr/lib/ocf/resource.d/heartbeat/IPaddr2，虚拟ip 192.168.10. 2, 每 30s检查一次)
　　使用 ifconfig 查看多了一个虚拟ip
做一次失效备援
/etc/init.d/corosync stop
crm status 会发现服务没被备用节点接管，是因为集群没有达到法定人数.为了数据避免遭到破坏，当 pacemaker 发现集群达不到法定人数时,就会停止所有资源
当有半数以上的节点在线时 ,这个集群就认为自己拥有法定人数了 ,是“合 法”的 ,换 而言之就是公式: total_nodes < 2*active_nodes
因此在双节点的集群中,只有当两者都在线时才是合法的 .这个规则会让双节点的集 群毫无意义 ,但是我们可以控制Pacemaker发现集群达不到法定人数时候的行 为. 简单来说 ,我们告诉集群忽略它 。
crm configure property no-quorum-policy=ignore
crm configure show(显示配置)
crm status(查看运行状态)
... ...
clusterip (ocf::heartbeat:IPaddr): Started station42.example.com
此时服务迁移过去了
/etc/init.d/corosync start
crm status
... ...
clusterip (ocf::heartbeat:IPaddr): Started station41.example.com
当主节点恢复时, 资源自动迁移到主节点
当然了，也可以使用这条命令来阻止支援在节点恢复后的移动
crm configure rsc_defaults resource-stickiness=100
(最好不要这样做)
　　
添加apache服务资源
在主/备节点上进行
yum install httpd -y
echo `hostname` >/var/www/html/index.html
为了监控 apache 的健康状况，需修改apache配置文件
将注释去掉

   SetHandler server-status
   Order deny,allow
   Deny from all
   Allow from 127.0.0.1

在主节点上配置
crm configure primitive website ocf:heartbeat:apache \
params configfile=/etc/httpd/conf/httpd.conf op monitor interval=1min
添加一个资源，名称是 website，调用叫做 apache 的 ocf 脚本，在 heartbeat 名字空间里，指定配置文件，每 60s 检查一次
crm status
　　......
clusterip    (ocf::heartbeat:IPaddr): Started station41.example.com
website    (ocf::heartbeat:apache): Started station42.example.com
会发现资源没有在同一主机上运行，为了确保资源在同一台主机上运行，运行下面命令
crm configure colocation website-with-ip inf: website clusterip
crm status
... ...
clusterip (ocf::heartbeat:IPaddr): Started station41.example.com
website    (ocf::heartbeat: apache):  Started station41.example.com
此时发现资源在同一主机上运行
可以打开浏览器来测试下，输入vip 192.168. 10.2,显示station41.example. com
/etc/ini t. d/corosync stop,显示的是 station42.example.com
控制资源的启动停止顺序
crm configure order apache-after-ip inf: clusterip website
(开启从左到右，关闭时从右到左)
指定优先 l ocation(机器配置不一样时，可以指定资源优先在处理性能好的机器上)
crm configure location prefer-pcmk-1 website 50: pcmk-1
　　
添加 DRBD 同步存储资源
配置 DRBD 的步骤可参考《使用 DRBD 同步磁盘》这篇文档
在集群中配置 DRBD
crm
crm(live)#configure primitive webdata ocf:linbit:drbd \
params drbd_resource=r0 \
op monitor interval=60s
crm(live)#configure ms webdataclone webdata meta master-max=1 master-node-max=1 \
      clone-max=2 clone-node-max=1 notify=true
crm(live)#configure show
crm(live)#commit
crm status
……
Master/Slave Set: WebDataClone
Masters: [ station41.example.com ]
Sl aves: [ station42.example.com]
　　
配置一个 Filesystem 资源来使用drbd
crm
crm(live)#configure primitive webfs ocf:heartbeat:Filesystem \
params device="/dev/drbd1" directory="/var/www/html" fstype= "ext3"
　　crm(live)# configure colocation fs_on_drbd inf: webfs webdataclone:Master
crm(live)#configure order webfs-after-webdata inf: webdataclone:promote webfs:start
crm(live)# configure colocation website-with-webfs inf: website webfs
crm(live)# configure order website-after- webfs inf: webfs website
　　crm(live)# configure commi t
　　测试
　　crm node standby
　　crm status
　　... ...
　　Node station41.example.com: standby
　　Online: [station42.example.com]
　　clusterip  (ocf::heartbeat:IPaddr):  Started station42.example.com
　　website (ocf::heartbeat:apache):
　　Started station42.example.com
　　Master/Slave Set: webdataclone
　　Masters: [ station42.example.com]
　　Stopped: [ webdata:1 ]
　　webfs(ocf::heartbeat:Filesystem):    Started station42.example.com
　　
　　crm node online
　　crm status
　　... ...
　　Node station42.example.com: standby
　　Online: [station41.example.com]
　　clusterip  (ocf::heartbeat:IPaddr):  Started station41.example.com
　　website (ocf:: heartbeat: apache): Started station41.example.com
　　Master/Slave Set:webdataclone
　　Masters: [ station41.example.com]
　　Stopped: [ webdata:1]
　　webfs (ocf::heartbeat:Filesystem): Started station41.example.com
　　参考网站
　　http://www.clusterlabs.org/doc/zh-CN/Pacemaker/1.1/html/Clusters_ from_Scratch/index.html

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

Red Hat RHCE 8 (EX294) Cert Guide

c++ size_t 和 int 的区别

HERE 使用 AWS EF 和 JFrog Artifactory 打

C++ 指针大全：从基础到进阶，一篇快速上手

wirelessnetview好用的无线分析工具

亿图图示专家(EDraw Max) V7.9 中文破解版

[经验分享] 搭建高可用集群drbd同步存储和apache服务

浏览过的版块

扫码加入运维网微信交流群