使用cgroups来控制内存使用

zhoujun.g 发表于 2016-11-22 06:14:09

　　磨砺技术珠矶，践行数据之道，追求卓越价值
　　回到上一级页面：PostgreSQL内部结构与源代码研究索引页回到顶级页面：PostgreSQL索引页
　　[作者高健@博客园luckyjackgao@gmail.com]
　　
　　首先学习网上例子，进行体验性的试验：
　　首先不限制内存使用来进行下载：

# free -m
total    used    free shared buffers cached
Mem:       2006    484    1522       0       29    175
-/+ buffers/cache:    279    1727
Swap:       4031       0    4031
#
　　然后，再下载约700M:
　　wget http://centos.arcticnetwork.ca/6.4/isos/x86_64/CentOS-6.4-x86_64-LiveCD.iso
　　然后看内存使用情况：

# free -m
total    used    free shared buffers cached
Mem:       2006    1224    782       0       33    878
-/+ buffers/cache:    312    1694
Swap:       4031       0    4031
#
　　确实是用掉了700多M内存。
　　
　　然后，重新启动，限制内存使用：

# service cgconfig status
Stopped

#mount -t cgroup -o memory memcg /cgroup
# mkdir /cgroup/GroupA
# echo 10M > /cgroup/GroupA/memory.limit_in_bytes
# echo $$ > /cgroup/GroupA/tasks
　　然后，再看内存状况：

# free -m
total    used    free shared buffers cached
Mem:       2006    481    1525       0       29    174
-/+ buffers/cache:    276    1729
Swap:       4031       0    4031
#
　　再下载约700M:
　　wget http://centos.arcticnetwork.ca/6.4/isos/x86_64/CentOS-6.4-x86_64-LiveCD.iso
　　再看内存使用前后对比:

# free -m
total    used    free shared buffers cached
Mem:       2006    512    1494       0       32    186
-/+ buffers/cache:    293    1713
Swap:       4031       0    4031
#
　　可以知道，大约的内存使用量为 1525-1494=31M。不过free命令观察到的结果是有误差的，程序执行时间长，free就是一个不断累减的值，由于当前shell被限制使用内存最大10M，那么基数很小的情况下，时间越长，误差越大。
　　下面，看看对PostgreSQL能否产生良好的限制：
　　再此之前，通过系统设定来看看对postgres用户进行wget操作时的内存的控制：

$ cat /etc/cgconfig.conf
#
#Copyright IBM Corporation. 2007
#
#Authors: Balbir Singh <balbir@linux.vnet.ibm.com>
#This program is free software; you can redistribute it and/or modify it
#under the terms of version 2.1 of the GNU Lesser General Public License
#as published by the Free Software Foundation.
#
#This program is distributed in the hope that it would be useful, but
#WITHOUT ANY WARRANTY; without even the implied warranty of
#MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# See man cgconfig.conf for further details.
#
# By default, mount all controllers to /cgroup/<controller>
mount {
cpuset = /cgroup/cpuset;
cpu = /cgroup/cpu;
cpuacct = /cgroup/cpuacct;
memory = /cgroup/memory;
devices = /cgroup/devices;
freezer = /cgroup/freezer;
net_cls = /cgroup/net_cls;
blkio = /cgroup/blkio;
}
group test1 {
perm {
task{
uid=postgres;
gid=postgres;
}
admin{
uid=root;
gid=root;
}
} memory {
memory.limit_in_bytes=30M;
}
}
$
　　还有一个文件，cgrules.conf，也很重要:

$ cat /etc/cgrules.conf
# /etc/cgrules.conf
#
#Each line describes a rule for a user in the forms:
#
#<user>          <controllers>    <destination>
#<user>:<process name> <controllers>    <destination>
#
#Where:
# <user> can be:
#    - an user name
#    - a group name, with @group syntax
#    - the wildcard *, for any user or group.
#    - The %, which is equivalent to "ditto". This is useful for
#       multiline rules where different cgroups need to be specified
#       for various hierarchies for a single user.
#
# <process name> is optional and it can be:
# - a process name
# - a full command path of a process
#
# <controller> can be:
#    - comma separated controller names (no spaces)
#    - * (for all mounted controllers)
#
# <destination> can be:
#    - path with-in the controller hierarchy (ex. pgrp1/gid1/uid1)
#
# Note:
# - It currently has rules based on uids, gids and process name.
#
# - Don't put overlapping rules. First rule which matches the criteria
# will be executed.
#
# - Multiline rules can be specified for specifying different cgroups
# for multiple hierarchies. In the example below, user "peter" has
# specified 2 line rule. First line says put peter's task in test1/
# dir for "cpu" controller and second line says put peter's tasks in
# test2/ dir for memory controller. Make a note of "%" sign in second line.
# This is an indication that it is continuation of previous rule.
#
#
#<user>    <controllers>    <destination>
#
#john       cpu    usergroup/faculty/john/
#john:cp    cpu    usergroup/faculty/john/cp
#@student    cpu,memory usergroup/student/
#peter       cpu    test1/
#%       memory    test2/
#@root          *    admingroup/
#*    *    default/
# End of file
postgres    memory       test1/
#
$
　　在root用户，设置如下两个服务随系统启动：
　　chkconfig cgconfigon
　　chkconfig cgred on
　　然后重新启动系统后，用postgres用户进行登录，进行检验：

$ free -m
total    used    free shared buffers cached
Mem:       2006    381    1625       0       25    134
-/+ buffers/cache:    221    1785

$ wget http://centos.arcticnetwork.ca/6.4/isos/x86_64/CentOS-6.4-x86_64-LiveCD.iso
　　执行完毕后，看内存状况，成功。

$ free -m
total    used    free shared buffers cached
Mem:       2006    393    1613       0       28    141
-/+ buffers/cache:    224    1782
Swap:       4031       67    3964
$
　　
　　下面看对postgresql中执行sql 的限制如何：
　　

步骤1: 对/etc/cgconfig.conf 文件和 /etc/cgrules.conf 文件的设置如前所述。　　

步骤2: 运行前查看内存状况：　　

$ free -m

         total    used    free shared buffers cached

Mem:       2006    384    1622       0       26    138

-/+ buffers/cache:    219    1787

Swap:       4031       87    3944

$ 　　

步骤3: 开始处理大量数据(约600MB)　　

postgres=# select count(*) from test01;
count
-------
0
(1 row)
postgres=# insert into test01 values(generate_series(1,614400),repeat( chr(int4(random()*26)+65),1024));
　　运行刚刚开始，就出现了如下的错误：

The connection to the server was lost. Attempting reset: Failed.
!>
　　这和之前碰到的崩溃情形一致。

PostgreSQL的log本身是这样的：

$ LOG:database system was shut down at 2013-09-09 16:20:29 CST
LOG:database system is ready to accept connections
LOG:autovacuum launcher started
LOG:server process (PID 2697) was terminated by signal 9: Killed
DETAIL:Failed process was running: insert into test01 values(generate_series(1,614400),repeat( chr(int4(random()*26)+65),1024));
LOG:terminating any other active server processes
WARNING:terminating connection because of crash of another server process
DETAIL:The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:In a moment you should be able to reconnect to the database and repeat your command.
FATAL:the database system is in recovery mode
LOG:all server processes terminated; reinitializing
LOG:database system was interrupted; last known up at 2013-09-09 17:35:42 CST
LOG:database system was not properly shut down; automatic recovery in progress
LOG:redo starts at 1/9E807C90
LOG:unexpected pageaddr 1/946BE000 in log file 1, segment 159, offset 7069696
LOG:redo done at 1/9F6BDB50
LOG:database system is ready to accept connections
LOG:autovacuum launcher started
　　通过dmesg命令，可以看到，发生了Out of Memory错误，这次是 cgroup out of memory
　　

$ dmesg | grep post
[ 2673] 5002673 64453    200 0    0          0 postgres
[ 2675] 5002675 64494    79 0    0          0 postgres
[ 2676] 5002676 64453    75 0    0          0 postgres
[ 2677] 5002677 64453    77 0    0          0 postgres
[ 2678] 5002678 64667    80 0    0          0 postgres
[ 2679] 5002679 28359    72 0    0          0 postgres
[ 2697] 5002697 64764    100 0    0          0 postgres
[ 2673] 5002673 64453    200 0    0          0 postgres
[ 2675] 5002675 64494    79 0    0          0 postgres
[ 2676] 5002676 64453    75 0    0          0 postgres
[ 2677] 5002677 64453    77 0    0          0 postgres
[ 2678] 5002678 64667    80 0    0          0 postgres
[ 2679] 5002679 28359    72 0    0          0 postgres
[ 2697] 5002697 64764    100 0    0          0 postgres
[ 2673] 5002673 64453    208 0    0          0 postgres
[ 2675] 5002675 64494    79 0    0          0 postgres
[ 2676] 5002676 64453    98 0    0          0 postgres
[ 2677] 5002677 64453    782 0    0          0 postgres
[ 2678] 5002678 64667    133 0    0          0 postgres
[ 2679] 5002679 28359    86 0    0          0 postgres
[ 2697] 5002697 73075 3036 0    0          0 postgres
Memory cgroup out of memory: Kill process 2697 (postgres) score 1000 or sacrifice child
Killed process 2697, UID 500, (postgres) total-vm:292300kB, anon-rss:8432kB, file-rss:3712kB
$
　　我怀疑自己的内存开得过小了，影响到一些基本的运行。PostgreSQL本身也需要一些资源(shared_buffers、wal_buffers都需要用一些内存)
　　
　　所以我调整了参数 memory.limit_in_bytes=300M ，再次运行：
前述的sql问处理1200MB数据，成功结束，内存没有过多增长。
　　[作者高健@博客园luckyjackgao@gmail.com]
　　回到上一级页面：PostgreSQL内部结构与源代码研究索引页回到顶级页面：PostgreSQL索引页
　　磨砺技术珠矶，践行数据之道，追求卓越价值

页: [1]

运维网's Archiver

使用cgroups来控制内存使用