[转载] HAproxy健康检测机制 - haproxy - 运维网

guzan 发表于 2015-9-5 07:23:32

[运维网] HAproxy健康检测机制

　　http://blog.chinaunix.net/uid-10249062-id-163273.html
　　
　　备注：HAProxy版本为1.4.6
1 概述
　　HAProxy作为Loadbalance，支持对backend的健康检查，以保证在后端backend不能服务时，把从frotend进来的request分配至可以其它可服务的backend，从而保证整体服务的可用性。
2 相关配置

[*]　　option httpchk <method> <uri> <version>
　　启用七层健康检测

[*]　　http-check disable-on-404
　　如果backend返回404，则除了长连接之外的后续请求将不被分配至该backend

[*]　　http-check send-state
　　增加一个header，同步HAProxy中看到的backend状态。该header为server可见。 X-Haproxy-Server-State: UP 2/3; name=bck/srv2; node=lb1; weight=1/2; scur=13/22; qcur=0

[*]　　server option
　　check：启用健康检测
　　inter：健康检测间隔
　　rise：检测服务可用的连续次数
　　fall：检测服务不可用的连续次数
　　error-limit：往server写数据连续失败的次数上限，执行on-error的设定
　　observe <mode>：把正常服务过程作为健康检测请求，即实时检测
　　on-error <mode>：满足error-limit后执行的操作（fastinter、fail-check、sudden-death、mark-down）。其中fastinter表示立即按照fastinter的检测延时进行。fail-check表示改次error作为一次检测；sudden-death表示模仿一次fatal，如果紧接着一次fail则置server为down;mark-down表示直接把server置为down状态。

[*]　　其它
　　retries：连接失败重试的次数，如果重试该次数后还不能正常服务，则断开连接。
3 检测机制
3.1 相关数据结构
　　struct server {
　　......
　　int health; /* 0->rise-1 = bad; rise->rise+fall-1 = good */
　　int consecutive_errors; /* current number of consecutive errors */
　　int rise, fall; /* time in iterations */
　　int consecutive_errors_limit; /* number of consecutive errors that triggers an event */
　　short observe, onerror; /* observing mode: one of HANA_OBS_*; what to do on error: on of ANA_ONERR_* */
　　int inter, fastinter, downinter; /* checks: time in milliseconds */
　　......
　　}
3.2 check流程

3.3 server状态切换条件

[*]　　UP-->DOWN
　　初始为s->health=s->rise;
　　if (s->health < s->rise + s->fall – 1) then s->health = s->rise + s->fall – 1;
　　check失败：s->health--
　　if (s->health <= s->rise) then set_server_down(), s->health = 0;
[*]　　DOWN-->UP
　　初始为s->health=0;
　　check成功：s->health++
　　if (s->health == s->rise) then set_server_up(), s->health = s->rise + s->fall – 1;
3.4 observe机制
　　observe机制是分析请求服务过程中发生错误的时候调用heath_adjust函数来实时更新check机制中的相关计数。其跟check机制的区别在于，check机制只通过定时检测。observe机制基于check机制。在不同的on-error(mode)情况下对s->health的影响如下：
　　备注：执行on-error(mode)的前提是
　　s->consecutive_errors < s->consecutive_errors_limit（连接失败的次数超过了上限）

[*]　　fastinter
　　不修改s->health值，但是会调整check出发的时间，时间为间隔fastinter后的数字。

[*]　　fail-check
　　把本次连接的失败作为1次check，s->health--

[*]　　sudden-death
　　把本次连接作为1次致命的失败，s->health = s->rise + 1，如下次还失败则置为DOWN

[*]　　mark-down
　　本次连接失败后，直接把后端server置为DOWN

页: [1]

运维网's Archiver

[运维网] HAproxy健康检测机制