PostreSQL崩溃试验全记录

cy_88 · 发表于 2016-11-22 06:57:50

　　磨砺技术珠矶，践行数据之道，追求卓越价值
　　回到上一级页面： PostgreSQL基础知识与基本操作索引页    回到顶级页面：PostgreSQL索引页
　　[作者高健@博客园  luckyjackgao@gmail.com]
　　
　　由于客户提到，运行某些大运算量批处理时，PostgreSQL突发性使用大量资源的问题，
　　进行了如下的调查，发现确实发生了崩溃。PostgreSQL 需要资源控制方案啊。
　　现在正在考虑是否必须要用 os 级别的限制方案：
　　过程如下：
　　测试环境：
　　内存：大约1024MB
　　postgresql.conf 设定：
　　使用缺省值：checkpoint_segments =3  shard_buffers=32MB
　　这些是有意为之，就是想看看数据量大、shared_buffer小的状况：
　　建立表(一条记录大约1024字节)：
　　 postgres=# create table test01(id integer, val char(1024));
　　向表中插入大量数据(总共插入2400MB)
　　 postgres=# insert into test01 values(generate_series(1,2457600),repeat( chr(int4(random()*26)+65),1024));
　　插入操作会花费一些时间，此时用ps 命令看PostgreSQL个进程对内存使用状况：‘
　　

[iyunv@server ~]# ps aux | grep post
root    3180  0.0  0.0 105296 712 pts/1 S 16:31 0:00 su - postgres
postgres  3181  0.0  0.0  70304 676 pts/1 S+ 16:31 0:00 -bash
postgres  3219  0.0  0.2 113644  2864 pts/1 S 16:32 0:00 /usr/local/pgsql/bin/postgres -D /gao/data
postgres  3221  0.4  3.0 113724 35252 ?       Ss 16:32 0:01 postgres: writer process
postgres  3222  0.2  0.1 113644  1616 ?       Ds 16:32 0:00 postgres: wal writer process
postgres  3223  0.0  0.0 114380  1148 ?       Ss 16:32 0:00 postgres: autovacuum launcher process
postgres  3224  0.0  0.0  73332 472 ?       Ss 16:32 0:00 postgres: stats collector process
root    3252  0.0  0.0 105296 712 pts/2 S 16:32 0:00 su - postgres
postgres  3253  0.0  0.0  70304 676 pts/2 S 16:32 0:00 -bash
postgres  3285  0.0  0.0  83488 740 pts/2 S+ 16:32 0:00 ./psql
postgres  3286 14.8 80.2 2598332 924308 ?    Ds 16:32 0:35 postgres: postgres postgres [local] INSERT
root    3333  0.0  0.0  65424 812 pts/3 S+ 16:36 0:00 grep post
　　可以发现，INSERT操作正在消耗80%以上的内存。
　　再等片刻，发现psql端传来反馈：

WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
The connection to the server was lost. Attempting reset: Failed.
!>
　　此时看看Log，可以看到：background writer (3321) 已经被干掉了，所有连接被重置。

LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (4 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (5 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (5 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (11 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (15 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (23 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (13 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (4 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (5 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  background writer process (PID 3221) was terminated by signal 9: Killed
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
LOG:  all server processes terminated; reinitializing
FATAL:  the database system is in recovery mode
LOG:  database system was interrupted; last known up at 2013-08-30 16:36:42 CST
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  consistent recovery state reached at 0/B7657BD0
LOG:  redo starts at 0/B60FE2B8
LOG:  unexpected pageaddr 0/B044C000 in log file 0, segment 184, offset 4505600
LOG:  redo done at 0/B844B940
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
　　各个进程都重新生成了：

[iyunv@server ~]# ps aux | grep post
root    3180  0.0  0.0 105296 660 pts/1 S 16:31 0:00 su - postgres
postgres  3181  0.0  0.0  70304 628 pts/1 S+ 16:31 0:00 -bash
postgres  3219  0.0  0.5 113644  6276 pts/1 S 16:32 0:00 /usr/local/pgsql/bin/postgres -D /gao/data
root    3252  0.0  0.0 105296 660 pts/2 S 16:32 0:00 su - postgres
postgres  3253  0.0  0.0  70304 628 pts/2 S 16:32 0:00 -bash
postgres  3285  0.0  0.0  83488  1144 pts/2 S+ 16:32 0:00 ./psql
postgres  3348  0.0  0.0 113644 984 ?       Ss 16:40 0:00 postgres: writer process
postgres  3349  0.0  0.0 113644 732 ?       Ss 16:40 0:00 postgres: wal writer process
postgres  3350  0.0  0.1 114336  1756 ?       Ss 16:40 0:00 postgres: autovacuum launcher process
postgres  3351  0.0  0.0  73332 580 ?       Ss 16:40 0:00 postgres: stats collector process
root    3361  0.0  0.0  65420 796 pts/3 R+ 16:44 0:00 grep post
[iyunv@server ~]#
　　回到psql端再看，发现连接已经失效了：

!> \
Invalid command \. Try \? for help.
!> \dt;
You are currently not connected to a database.
!>
　　根据向社区提问，据说是因为OS级别的OOM错误发生,所以杀死了Postmaster进程。
　　总之，此种情况表明，如果没有对资源消费总量的控制，那么突发性的用户访问会导致崩溃的。
　　
　　[作者高健@博客园  luckyjackgao@gmail.com]
　　回到上一级页面： PostgreSQL基础知识与基本操作索引页    回到顶级页面：PostgreSQL索引页
　　磨砺技术珠矶，践行数据之道，追求卓越价值

账号		自动登录	找回密码
密码			立即注册

VMware vcenter+vSphere 6.5 U2共享

【跟谁学】韩宇极简英语课-技术人员不得不

用Zabbix通过JMX方式监控weblogic

winhex数据恢复教程（非常巨大，内容丰富）

Symantec Backup Exec 2015 2016/2012 BE20

NetScaler VPX部署之：NetScaler Gateway调

zabbix3.4.1安装部署+微信推送信息+大屏显

[经验分享] PostreSQL崩溃试验全记录

扫码加入运维网微信交流群