|
===================================================
1. Simple HTTP load-balancing with cookie insertion
===================================================
A web application often saturates the front-end server with high CPU loads,
due to the scripting language involved. It also relies on a back-end database
which is not much loaded. User contexts are stored on the server itself, and
not in the database, so that simply adding another server with simple IP/TCP
load-balancing would not work.
+-------+
|clients| clients and/or reverse-proxy
+---+---+
|
-+-----+--------+----
| _|_db
+--+--+ (___)
| web | (___)
+-----+ (___)
192.168.1.1 192.168.1.2
Replacing the web server with a bigger SMP system would cost much more than
adding low-cost pizza boxes. The solution is to buy N cheap boxes and install
the application on them. Install haproxy on the old one which will spread the
load across the new boxes.
192.168.1.1 192.168.1.11-192.168.1.14 192.168.1.2
-------+-----------+-----+-----+-----+--------+----
| | | | | _|_db
+--+--+ +-+-+ +-+-+ +-+-+ +-+-+ (___)
| LB1 | | A | | B | | C | | D | (___)
+-----+ +---+ +---+ +---+ +---+ (___)
haproxy 4 cheap web servers
Config on haproxy (LB1) :
-------------------------
listen webfarm 192.168.1.1:80
mode http
balance roundrobin
cookie SERVERID insert indirect
option httpchk HEAD /index.html HTTP/1.0
server webA 192.168.1.11:80 cookie A check
server webB 192.168.1.12:80 cookie B check
server webC 192.168.1.13:80 cookie C check
server webD 192.168.1.14:80 cookie D check
Description :
-------------
- LB1 will receive clients requests.
- if a request does not contain a cookie, it will be forwarded to a valid
server
- in return, a cookie "SERVERID" will be inserted in the response holding the
server name (eg: "A").
- when the client comes again with the cookie "SERVERID=A", LB1 will know that
it must be forwarded to server A. The cookie will be removed so that the
server does not see it.
- if server "webA" dies, the requests will be sent to another valid server
and a cookie will be reassigned.
Flows :
-------
(client) (haproxy) (server A)
>-- GET /URI1 HTTP/1.0 ------------> |
( no cookie, haproxy forwards in load-balancing mode. )
| >-- GET /URI1 HTTP/1.0 ---------->
| <-- HTTP/1.0 200 OK -------------<
( the proxy now adds the server cookie in return )
<-- HTTP/1.0 200 OK ---------------< |
Set-Cookie: SERVERID=A |
>-- GET /URI2 HTTP/1.0 ------------> |
Cookie: SERVERID=A |
( the proxy sees the cookie. it forwards to server A and deletes it )
| >-- GET /URI2 HTTP/1.0 ---------->
| <-- HTTP/1.0 200 OK -------------<
( the proxy does not add the cookie in return because the client knows it )
<-- HTTP/1.0 200 OK ---------------< |
>-- GET /URI3 HTTP/1.0 ------------> |
Cookie: SERVERID=A |
( ... )
Limits :
--------
- if clients use keep-alive (HTTP/1.1), only the first response will have
a cookie inserted, and only the first request of each session will be
analyzed. This does not cause trouble in insertion mode because the cookie
is put immediately in the first response, and the session is maintained to
the same server for all subsequent requests in the same session. However,
the cookie will not be removed from the requests forwarded to the servers,
so the server must not be sensitive to unknown cookies. If this causes
trouble, you can disable keep-alive by adding the following option :
option httpclose
- if for some reason the clients cannot learn more than one cookie (eg: the
clients are indeed some home-made applications or gateways), and the
application already produces a cookie, you can use the "prefix" mode (see
below).
- LB1 becomes a very sensible server. If LB1 dies, nothing works anymore.
=> you can back it up using keepalived (see below)
- if the application needs to log the original client's IP, use the
"forwardfor" option which will add an "X-Forwarded-For" header with the
original client's IP address. You must also use "httpclose" to ensure
that you will rewrite every requests and not only the first one of each
session :
option httpclose
option forwardfor
The web server will have to be configured to use this header instead.
For example, on apache, you can use LogFormat for this :
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b " combined
CustomLog /var/log/httpd/access_log combined
Hints :
-------
Sometimes on the internet, you will find a few percent of the clients which
disable cookies on their browser. Obviously they have troubles everywhere on
the web, but you can still help them access your site by using the "source"
balancing algorithm instead of the "roundrobin". It ensures that a given IP
address always reaches the same server as long as the number of servers remains
unchanged. Never use this behind a proxy or in a small network, because the
distribution will be unfair. However, in large internal networks, and on the
internet, it works quite well. Clients which have a dynamic address will not
be affected as long as they accept the cookie, because the cookie always has
precedence over load balancing :
listen webfarm 192.168.1.1:80
mode http
balance source
cookie SERVERID insert indirect
option httpchk HEAD /index.html HTTP/1.0
server webA 192.168.1.11:80 cookie A check
server webB 192.168.1.12:80 cookie B check
server webC 192.168.1.13:80 cookie C check
server webD 192.168.1.14:80 cookie D check
==================================================================
2. HTTP load-balancing with cookie prefixing and high availability
==================================================================
Now you don't want to add more cookies, but rather use existing ones. The
application already generates a "JSESSIONID" cookie which is enough to track
sessions, so we'll prefix this cookie with the server name when we see it.
Since the load-balancer becomes critical, it will be backed up with a second
one in VRRP mode using keepalived under Linux.
Download the latest version of keepalived from this site and install it
on each load-balancer LB1 and LB2 :
http://www.keepalived.org/
You then have a shared IP between the two load-balancers (we will still use the
original IP). It is active only on one of them at any moment. To allow the
proxy to bind to the shared IP on Linux 2.4, you must enable it in /proc :
# echo 1 >/proc/sys/net/ipv4/ip_nonlocal_bind
shared IP=192.168.1.1
192.168.1.3 192.168.1.4 192.168.1.11-192.168.1.14 192.168.1.2
-------+------------+-----------+-----+-----+-----+--------+----
| | | | | | _|_db
+--+--+ +--+--+ +-+-+ +-+-+ +-+-+ +-+-+ (___)
| LB1 | | LB2 | | A | | B | | C | | D | (___)
+-----+ +-----+ +---+ +---+ +---+ +---+ (___)
haproxy haproxy 4 cheap web servers
keepalived keepalived
Config on both proxies (LB1 and LB2) :
--------------------------------------
listen webfarm 192.168.1.1:80
mode http
balance roundrobin
cookie JSESSIONID prefix
option httpclose
option forwardfor
option httpchk HEAD /index.html HTTP/1.0
server webA 192.168.1.11:80 cookie A check
server webB 192.168.1.12:80 cookie B check
server webC 192.168.1.13:80 cookie C check
server webD 192.168.1.14:80 cookie D check
Notes: the proxy will modify EVERY cookie sent by the client and the server,
so it is important that it can access to ALL cookies in ALL requests for
each session. This implies that there is no keep-alive (HTTP/1.1), thus the
"httpclose" option. Only if you know for sure that the client(s) will never
use keep-alive (eg: Apache 1.3 in reverse-proxy mode), you can remove this
option.
Configuration for keepalived on LB1/LB2 :
-----------------------------------------
vrrp_script chk_haproxy { # Requires keepalived-1.1.13
script "killall -0 haproxy" # cheaper than pidof
interval 2 # check every 2 seconds
weight 2 # add 2 points of prio if OK
}
vrrp_instance VI_1 {
interface eth0
state MASTER
virtual_router_id 51
priority 101 # 101 on master, 100 on backup
virtual_ipaddress {
192.168.1.1
}
track_script {
chk_haproxy
}
}
Description :
-------------
- LB1 is VRRP master (keepalived), LB2 is backup. Both monitor the haproxy
process, and lower their prio if it fails, leading to a failover to the
other node.
- LB1 will receive clients requests on IP 192.168.1.1.
- both load-balancers send their checks from their native IP.
- if a request does not contain a cookie, it will be forwarded to a valid
server
- in return, if a JESSIONID cookie is seen, the server name will be prefixed
into it, followed by a delimitor ('~')
- when the client comes again with the cookie "JSESSIONID=A~xxx", LB1 will
know that it must be forwarded to server A. The server name will then be
extracted from cookie before it is sent to the server.
- if server "webA" dies, the requests will be sent to another valid server
and a cookie will be reassigned.
Flows :
-------
(client) (haproxy) (server A)
>-- GET /URI1 HTTP/1.0 ------------> |
( no cookie, haproxy forwards in load-balancing mode. )
| >-- GET /URI1 HTTP/1.0 ---------->
| X-Forwarded-For: 10.1.2.3
| <-- HTTP/1.0 200 OK -------------<
( no cookie, nothing changed )
<-- HTTP/1.0 200 OK ---------------< |
>-- GET /URI2 HTTP/1.0 ------------> |
( no cookie, haproxy forwards in lb mode, possibly to another server. )
| >-- GET /URI2 HTTP/1.0 ---------->
| X-Forwarded-For: 10.1.2.3
| <-- HTTP/1.0 200 OK -------------<
| Set-Cookie: JSESSIONID=123
( the cookie is identified, it will be prefixed with the server name )
<-- HTTP/1.0 200 OK ---------------< |
Set-Cookie: JSESSIONID=A~123 |
>-- GET /URI3 HTTP/1.0 ------------> |
Cookie: JSESSIONID=A~123 |
( the proxy sees the cookie, removes the server name and forwards
to server A which sees the same cookie as it previously sent )
| >-- GET /URI3 HTTP/1.0 ---------->
| Cookie: JSESSIONID=123
| X-Forwarded-For: 10.1.2.3
| <-- HTTP/1.0 200 OK -------------<
( no cookie, nothing changed )
<-- HTTP/1.0 200 OK ---------------< |
( ... )
Hints :
-------
Sometimes, there will be some powerful servers in the farm, and some smaller
ones. In this situation, it may be desirable to tell haproxy to respect the
difference in performance. Let's consider that WebA and WebB are two old
P3-1.2 GHz while WebC and WebD are shiny new Opteron-2.6 GHz. If your
application scales with CPU, you may assume a very rough 2.6/1.2 performance
ratio between the servers. You can inform haproxy about this using the "weight"
keyword, with values between 1 and 256. It will then spread the load the most
smoothly possible respecting those ratios :
server webA 192.168.1.11:80 cookie A weight 12 check
server webB 192.168.1.12:80 cookie B weight 12 check
server webC 192.168.1.13:80 cookie C weight 26 check
server webD 192.168.1.14:80 cookie D weight 26 check |
|