使用HAProxy防范简单的DDos攻击

fdhfgh 发表于 2015-11-20 12:52:44

第一部分: 系统级防护

1.TCP syn flood 攻击

syn flood攻击是通过发送大量SYN包到一台服务器,使其饱和或者至少造成其上行带宽饱和。
如果攻击规模很大，已经撑满了你的所有Internet带宽，那么唯一的方法就是请求你的ISP给与协助。
我们本地的HAProxy上可以做一点简单防护，聊胜于无。

修改/etc/sysctl.conf，加入如下内容:

# Protection SYN flood
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.tcp_max_syn_backlog = 1024

第二部分: HAProxy的防护功能

HAProxy 1.5版本中多了一些有趣的功能，我们可以拿来防范一些小规模的攻击行为。
主要是stick-table表中可以存储额外的附加信息了。

形式如下:
stick-table type {ip | integer | string | binary } size *

每个主机只能有一个stick-table,表中能存储的额外信息类型如下:

- server_id : 用户请求被分配到的服务器的ID。形式为整数。
            这个值可以被"stick match", "stick store","stick on"规则使用。
            It is automatically enabled when referenced.

- gpc0 : 第一个通用计数器，形式为一个正32位整数。
      可以用于任何东西。通常会用于给一些特定的entry打tag。

- conn_cnt : Connection计数器，形式为一个正32位整数。
            记录了匹配当前entry的，从一个客户端处接收到的连接的绝对数量。
            这个数量并不意味着被accepted的连接数量，单纯就是收到的数量。

- conn_cur : 当前连接数，形式为一个正32位整数。
            当一新连接匹配指定的entry的时候，这个数值增加。
            当连接结束的时候，这个数值减少。
            通过这个值可以了解一个entry上任意时间点的准确的连接数。

- conn_rate() : connection的连接频率 (takes 12 bytes).
                     这个值统计指定时间范围内(毫秒为单位)进来的connection的频率。
                     这个数值可以通过ACL匹配一些规则。

- sess_cnt : Session计数器，形式为一个正32位整数。
            记录了匹配当前entry的，从一个客户端处接收到的session的绝对数量。
            一个session指的是一个已经被layer 4规则接受的connection。

- sess_rate() : session的连接频率 (takes 12 bytes).
                     这个值统计指定时间范围内(毫秒为单位)进来的session的频率。
                     这个数值可以通过ACL匹配一些规则。

- http_req_cnt : HTTP请求计数器，形式为一个正32位整数。
               记录了匹配当前entry的，从一个客户端接受到的HTTP请求的绝对数量。
               无论这个请求是合法还是非法。
               Note that this is different from sessions when keep-alive is used on the client side.

- http_req_rate() : HTTP的请求频率 (takes 12 bytes).
                           这个值统计指定时间范围内(毫秒为单位)进来的HTTP请求的频率。
                           无论这个请求是合法还是非法。
                           Note that this is different from sessions when keep-alive is used on the client side.

- http_err_cnt : HTTP错误计数器，形式为一个正32位整数。
               记录了匹配这个entry的HTTP错误的绝对数量，包含:
               无效的、被截断的请求
               被拒绝的或封堵的请求
               认证失败
               4xx错误

- http_err_rate() : HTTP的请求错误频率 (takes 12 bytes).
                           这个值统计指定时间范围内(毫秒为单位)匹配的entry产生的HTTP错误的频率。

- bytes_in_cnt : 一个匹配entry的客户端发往服务器的字节数，形式为一个正64位整数。
               Headers也包含在统计中，通常用于图片或者video服务器限制上传文件。

- bytes_in_rate() : 收到字节频率计数器(takes 12 bytes).
                           这个值统计指定时间范围内(毫秒为单位)收到的字节数的频率。
                           通常用于防止用户上传太快上传太多内容。
                     Warning: with large uploads, it is possible that the amount of uploaded data will be counted
                     once upon termination, thus causing spikes in the average transfer speed
                     instead of having a smooth one. This may partially be smoothed with
                     "option contstats" though this is not perfect yet. Use of byte_in_cnt is
                     recommended for better fairness.

- bytes_out_cnt : 服务器发往客户端的字节数，形式为一个正64位整数。
               Headers也包含在统计中，通常用于防止机器人爬站。

- bytes_out_rate() : 发送字节频率计数器(takes 12 bytes).
                              这个值统计指定时间范围内(毫秒为单位)服务器发送给客户端的字节数的频率。
                              通常用于防止用户下载太快太多内容。
                              Warning: with large transfers, it is possible that the amount of transferred data will be
                              counted once upon termination, thus causing spikes in the average
                              transfer speed instead of having a smooth one. This may partiallybe
                              smoothed with "option contstats" though this is not perfect yet.Use of
                              byte_out_cnt is recommended for better fairness.

2.慢查询攻击

Slowloris类型的攻击，客户端用非常非常慢的速度发送请求到服务器上。
通常是一个包头接一个包头或者更夸张的一个字符接一个字符，而每个包之间等待非常长的时间。
这样服务器端就不得不等待所有请求全部接收完毕才能返回响应。
这个攻击的目的是阻止正常用户访问我们提供的服务，所有的服务器资源都被用来等待处理慢查询了。

应对这种攻击，方法是在HAProxy中加入选项: "timeout http-request"
可以将这个值设置成5秒钟，应该已经足够长了。
这个参数告诉HAProxy最多等待5秒钟让客户端发送完整的HTTP请求，如果超过5秒，则HAProxy会切断连接并返回错误。

01 # On Aloha, the global section is already setup for you
02 # and the haproxy stats socket is available at /var/run/haproxy.stats
03 global
04    stats socket ./haproxy.stats level admin
05
06 defaults
07    option http-server-close
08    mode http
09    timeout http-request 5s
10    timeout connect 5s
11    timeout server 10s
12    timeout client 30s
13
14 listen stats
15    bind 0.0.0.0:8880
16    stats enable
17    stats hide-version
18    stats uri /
19    stats realm HAProxy\ Statistics
20    stats auth admin:admin
21
22 frontend ft_web
23    bind 0.0.0.0:8080
24
25    # Spalreadylit static and dynamic traffic since these requests have different impacts on the servers
26    use_backend bk_web_static if { path_end .jpg .png .gif .css .js }
27
28    default_backend bk_web
29
30 # Dynamic part of the application
31 backend bk_web
32    balance roundrobin
33    cookie MYSRV insert indirect nocache
34    server srv1 192.168.1.2:80 check cookie srv1 maxconn 100
35    server srv2 192.168.1.3:80 check cookie srv2 maxconn 100
36
37 # Static objects
38 backend bk_web_static
39    balance roundrobin
40    server srv1 192.168.1.2:80 check maxconn 1000
41    server srv2 192.168.1.3:80 check maxconn 1000

为了测试这个选项的效果，可以使用telnet连接到HAProxy的端口，然后等待5秒钟,输出类似下边:
telnet 127.0.0.1 8080
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
HTTP/1.0 408 Request Time-out
Cache-Control: no-cache
Connection: close
Content-Type: text/html

408 Request Time-out
Your browser didn't send a complete request in time.

Connection closed by foreign host.

3.Unfair users, AKA abusers

何谓一个"Unfair"用户？就是指这个用户(或者脚本)有不同于正常用户的行为:

打开太多连接
建立新连接的频率太快
http请求的频率太快
使用太多的带宽
客户端不遵守RFC协议(IE for SMTP)

正常的浏览器行为？

在保护我们站点不受怪异的行为伤害之前，我们需要了解正常的浏览器行为是什么样子的。
首先，用户会打开Chrome, Firefox, Internet Explorer, Opera之间的一种，然后输入URL。
浏览器首先去请求DNS解析IP地址，然后它会建立一个和服务器的连接，下载首页，分析其中的内容。
然后根据页面的HTML代码的连接去下载不同的对象：javascript, css, 图片等等。
为了下载对象，对于每个域名，浏览器会打开6或7个TCP连接。
等到所有对象下载完毕，浏览器会将这些对象叠加渲染出整个页面。

4.限制每个用户的连接数

根据之前的说明，每个用户浏览器会与webserver之间非常快的打开6或7个TCP连接，那么我们可以认为如果
一个用户打开了超过10个TCP连接就是不正常的行为。

以下示例作了每用户连接数的限制,重点在于25-32行。

01 # On Aloha, the global section is already setup for you
02 # and the haproxy stats socket is available at /var/run/haproxy.stats
03 global
04    stats socket ./haproxy.stats level admin
05
06 defaults
07    option http-server-close
08    mode http
09    timeout http-request 5s
10    timeout connect 5s
11    timeout server 10s
12    timeout client 30s
13
14 listen stats
15    bind 0.0.0.0:8880
16    stats enable
17    stats hide-version
18    stats uri /
19    stats realm HAProxy\ Statistics
20    stats auth admin:admin
21
22 frontend ft_web
23    bind 0.0.0.0:8080
24
25    # Table definition
26    stick-table type ip size 100k expire 30s store conn_cur
27
28    # Allow clean known IPs to bypass the filter
29    tcp-request connection accept if { src -f /etc/haproxy/whitelist.lst }
30    # Shut the new connection as long as the client has already 10 opened
31    tcp-request connection reject if { src_conn_cur ge 10 }
32    tcp-request connection track-sc1 src
33
34    # Split static and dynamic traffic since these requests have different impacts on the servers
35    use_backend bk_web_static if { path_end .jpg .png .gif .css .js }
36
37    default_backend bk_web
38
39 # Dynamic part of the application
40 backend bk_web
41    balance roundrobin
42    cookie MYSRV insert indirect nocache
43    server srv1 192.168.1.2:80 check cookie srv1 maxconn 100
44    server srv2 192.168.1.3:80 check cookie srv2 maxconn 100
45
46 # Static objects
47 backend bk_web_static
48    balance roundrobin
49    server srv1 192.168.1.2:80 check maxconn 1000
50    server srv2 192.168.1.3:80 check maxconn 1000

注意:
1.如果HAProxy用于代理多个域名，则需要增加conn_cur的数量，因为每个域名会有5-7个连接。
2.如果多个用户在同一NAT设备后边，则这个限制会对他们产生不利影响，原因显而易见。

为了测试限制的效果，我们做如下测试:

使用Apache bench打开10个连接:
ab -n 50000000 -c 10 http://127.0.0.1:8080/

观察haproxy的stats输出:
echo "show table ft_web" | socat unix:./haproxy.stats -
# table: ft_web, type: ip, size:102400, used:1
0x7afa34: key=127.0.0.1 use=10 exp=29994 conn_cur=10

我们打开telnet尝试建立新连接:
telnet 127.0.0.1 8080
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Connection closed by foreign host.

发现已经被HAProxy拒绝了。

5.限制每用户建立连接的频率

我们可以认为3秒钟之内建立超过20个连接的用户就是非正常访问。
以下示例作了每用户建立连接的频率,重点在于25-32行。

01 # On Aloha, the global section is already setup for you
02 # and the haproxy stats socket is available at /var/run/haproxy.stats
03 global
04    stats socket ./haproxy.stats level admin
05
06 defaults
07    option http-server-close
08    mode http
09    timeout http-request 5s
10    timeout connect 5s
11    timeout server 10s
12    timeout client 30s
13
14 listen stats
15    bind 0.0.0.0:8880
16    stats enable
17    stats hide-version
18    stats uri /
19    stats realm HAProxy\ Statistics
20    stats auth admin:admin
21
22 frontend ft_web
23    bind 0.0.0.0:8080
24
25    # Table definition
26    stick-table type ip size 100k expire 30s store conn_rate(3s)
27
28    # Allow clean known IPs to bypass the filter
29    tcp-request connection accept if { src -f /etc/haproxy/whitelist.lst }
30    # Shut the new connection as long as the client has already 10 opened
31    tcp-request connection reject if { src_conn_rate ge 10 }
32    tcp-request connection track-sc1 src
33
34    # Split static and dynamic traffic since these requests have different impacts on the servers
35    use_backend bk_web_static if { path_end .jpg .png .gif .css .js }
36
37    default_backend bk_web
38
39 # Dynamic part of the application
40 backend bk_web
41    balance roundrobin
42    cookie MYSRV insert indirect nocache
43    server srv1 192.168.1.2:80 check cookie srv1 maxconn 100
44    server srv2 192.168.1.3:80 check cookie srv2 maxconn 100
45
46 # Static objects
47 backend bk_web_static
48    balance roundrobin
49    server srv1 192.168.1.2:80 check maxconn 1000
50    server srv2 192.168.1.3:80 check maxconn 1000

注意:如果多个用户在同一NAT设备后边，则这个限制也会对他们产生不利影响，原因同样显而易见。

为了测试限制的效果，我们做如下测试:

ab -n 10 -c 1 -r http://127.0.0.1:8080/

echo "show table ft_web" | socat unix:./haproxy.stats -
# table: ft_web, type: ip, size:102400, used:1
0x11faa3c: key=127.0.0.1 use=0 exp=28395 conn_rate(3000)=10

telnet 127.0.0.1 8080
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Connection closed by foreign host.

6.限制HTTP请求的频率

01 # On Aloha, the global section is already setup for you
02 # and the haproxy stats socket is available at /var/run/haproxy.stats
03 global
04    stats socket ./haproxy.stats level admin
05
06 defaults
07    option http-server-close
08    mode http
09    timeout http-request 5s
10    timeout connect 5s
11    timeout server 10s
12    timeout client 30s
13
14 listen stats
15    bind 0.0.0.0:8880
16    stats enable
17    stats hide-version
18    stats uri /
19    stats realm HAProxy\ Statistics
20    stats auth admin:admin
21
22 frontend ft_web
23    bind 0.0.0.0:8080
24
25    # Use General Purpose Couter (gpc) 0 in SC1 as a global abuse counter
26    # Monitors the number of request sent by an IP over a period of 10 seconds
27    stick-table type ip size 1m expire 10s store gpc0,http_req_rate(10s)
28    tcp-request connection track-sc1 src
29    tcp-request connection reject if { src_get_gpc0 gt 0 }
30
31    # Split static and dynamic traffic since these requests have different impacts on the servers
32    use_backend bk_web_static if { path_end .jpg .png .gif .css .js }
33
34    default_backend bk_web
35
36 # Dynamic part of the application
37 backend bk_web
38    balance roundrobin
39    cookie MYSRV insert indirect nocache
40
41    # If the source IP sent 10 or more http request over the defined period,
42    # flag the IP as abuser on the frontend
43    acl abuse src_http_req_rate(ft_web) ge 10
44    acl flag_abuser src_inc_gpc0(ft_web)
45    tcp-request content reject if abuse flag_abuser
46
47    server srv1 192.168.1.2:80 check cookie srv1 maxconn 100
48    server srv2 192.168.1.3:80 check cookie srv2 maxconn 100
49
50 # Static objects
51 backend bk_web_static
52    balance roundrobin
53    server srv1 192.168.1.2:80 check maxconn 1000
54    server srv2 192.168.1.3:80 check maxconn 1000

测试方法同上边。

7.检测漏洞扫描

如果有人尝试对我们的站点进行漏洞扫描，那么通过HAProxy可以追踪到不同的错误。

HAProxy可以监控每个用户产生错误的频率，并且根据这个频率决定进一步的操作。

01 # On Aloha, the global section is already setup for you
02 # and the haproxy stats socket is available at /var/run/haproxy.stats
03 global
04    stats socket ./haproxy.stats level admin
05
06 defaults
07    option http-server-close
08    mode http
09    timeout http-request 5s
10    timeout connect 5s
11    timeout server 10s
12    timeout client 30s
13
14 listen stats
15    bind 0.0.0.0:8880
16    stats enable
17    stats hide-version
18    stats uri /
19    stats realm HAProxy\ Statistics
20    stats auth admin:admin
21
22 frontend ft_web
23    bind 0.0.0.0:8080
24
25    # Use General Purpose Couter 0 in SC1 as a global abuse counter
26    # Monitors the number of errors generated by an IP over a period of 10 seconds
27    stick-table type ip size 1m expire 10s store gpc0,http_err_rate(10s)
28    tcp-request connection track-sc1 src
29    tcp-request connection reject if { src_get_gpc0 gt 0 }
30
31    # Split static and dynamic traffic since these requests have different impacts on the servers
32    use_backend bk_web_static if { path_end .jpg .png .gif .css .js }
33
34    default_backend bk_web
35
36 # Dynamic part of the application
37 backend bk_web
38    balance roundrobin
39    cookie MYSRV insert indirect nocache
40
41    # If the source IP generated 10 or more http request over the defined period,
42    # flag the IP as abuser on the frontend
43    acl abuse src_http_err_rate(ft_web) ge 10
44    acl flag_abuser src_inc_gpc0(ft_web)
45    tcp-request content reject if abuse flag_abuser
46
47    server srv1 192.168.1.2:80 check cookie srv1 maxconn 100
48    server srv2 192.168.1.3:80 check cookie srv2 maxconn 100
49
50 # Static objects
51 backend bk_web_static
52    balance roundrobin
53    server srv1 192.168.1.2:80 check maxconn 1000
54    server srv2 192.168.1.3:80 check maxconn 1000

我们通过如下方法测试:

ab -n 10 http://127.0.0.1:8080/dlskfjlkdsjlkfdsj

echo "show table ft_web" | socat unix:./haproxy.stats -
# table: ft_web, type: ip, size:1048576, used:1
0x8a9770: key=127.0.0.1 use=0 exp=5866 gpc0=1 http_err_rate(10000)=11.

再次执行上边的ab命令，会得到如下错误:
apr_socket_recv: Connection reset by peer (104)

说明HAProxy已经block这个IP。

------------------------
PS: 关于gpc0这个计数器的用法，我说说我的理解，这个地方还没搞太透，有点疑惑，如有达人明白，欢迎赐教解惑。

在haproxy 1.5的官方文档中，有如下一段示例:

frontend http
# Use General Purpose Couter 0 in SC1 as a global abuse counter
# protecting all our sites
stick-table type ip size 1m expire 5m store gpc0
tcp-request connection track-sc1 src
tcp-request connection reject if { sc1_get_gpc0 gt 0 }
...
use_backend http_dynamic if { path_end .php }

backend http_dynamic
# if a source makes too fast requests to this dynamic site (tracked
# by SC2), block it globally in the frontend.
stick-table type ip size 1m expire 5m store http_req_rate(10s)
acl click_too_fast sc2_http_req_rate gt 10
acl mark_as_abuser sc1_inc_gpc0
tcp-request content track-sc2 src
tcp-request content reject if click_too_fast mark_as_abuser

关于上边这段，我是这么理解的:

1.backend部分

定义一个表，类型IP地址，大小1M，过期时间5分钟，保存10秒之内的http请求速率。
通过sc2_http_req_rate取出当前速率，如果大于10，则acl click_too_fast生效。
因为acl click_too_fast已经检验生效，则判定该用户为abuser，给gpc0计数器加1。
拒绝符合acl click_too_fast和符合aclmark_as_abuser 的用户。

2.frontend部分
定义一个表，类型IP地址，大小1M，过期时间5分钟，保存通用计数器gpc0
跟踪所有connection的源地址。(此处作用比较疑惑，求赐教。)
在layer 4拒绝被backend标记为abuser的IP(acl mark_as_abuser sc1_inc_gpc0)

另有一段示例如下:
   # block if 5 consecutive requests continue to come faster than 10 sess
   # per second, and reset the counter as soon as the traffic slows down.
   acl abuse src_http_req_rate gt 10
   acl killsrc_inc_gpc0 gt 5
   acl savesrc_clr_gpc0
   tcp-request connection accept if !abuse save
　　    tcp-requestconnection reject if abuse kill
　　

　　

页: [1]

运维网's Archiver

使用HAProxy防范简单的DDos攻击