设为首页 收藏本站
查看: 995|回复: 0

[经验分享] PostreSQL崩溃试验全记录

[复制链接]

尚未签到

发表于 2016-11-22 06:57:50 | 显示全部楼层 |阅读模式
  磨砺技术珠矶,践行数据之道,追求卓越价值
  回到上一级页面: PostgreSQL基础知识与基本操作索引页     回到顶级页面:PostgreSQL索引页
  [作者 高健@博客园  luckyjackgao@gmail.com]
  
  由于客户提到,运行某些大运算量批处理时,PostgreSQL突发性使用大量资源的问题,
  进行了如下的调查,发现确实发生了崩溃。PostgreSQL 需要资源控制方案啊。
  现在正在考虑是否必须要用 os 级别的限制方案:
  过程如下:
  测试环境:
  内存:大约1024MB
  postgresql.conf 设定:
      使用缺省值:checkpoint_segments =3  shard_buffers=32MB
      这些是有意为之,就是想看看数据量大、shared_buffer小的状况:
  建立表(一条记录大约1024字节):
      postgres=# create table test01(id integer, val char(1024));
  向表中插入大量数据(总共插入2400MB)
     postgres=# insert into test01 values(generate_series(1,2457600),repeat( chr(int4(random()*26)+65),1024));
  插入操作会花费一些时间,此时用ps 命令看PostgreSQL个进程对内存使用状况:‘
  



[iyunv@server ~]# ps aux | grep post
root      3180  0.0  0.0 105296   712 pts/1    S    16:31   0:00 su - postgres
postgres  3181  0.0  0.0  70304   676 pts/1    S+   16:31   0:00 -bash
postgres  3219  0.0  0.2 113644  2864 pts/1    S    16:32   0:00 /usr/local/pgsql/bin/postgres -D /gao/data
postgres  3221  0.4  3.0 113724 35252 ?        Ss   16:32   0:01 postgres: writer process                  
postgres  3222  0.2  0.1 113644  1616 ?        Ds   16:32   0:00 postgres: wal writer process              
postgres  3223  0.0  0.0 114380  1148 ?        Ss   16:32   0:00 postgres: autovacuum launcher process     
postgres  3224  0.0  0.0  73332   472 ?        Ss   16:32   0:00 postgres: stats collector process         
root      3252  0.0  0.0 105296   712 pts/2    S    16:32   0:00 su - postgres
postgres  3253  0.0  0.0  70304   676 pts/2    S    16:32   0:00 -bash
postgres  3285  0.0  0.0  83488   740 pts/2    S+   16:32   0:00 ./psql
postgres  3286 14.8 80.2 2598332 924308 ?      Ds   16:32   0:35 postgres: postgres postgres [local] INSERT
root      3333  0.0  0.0  65424   812 pts/3    S+   16:36   0:00 grep post
  可以发现,INSERT操作正在消耗80%以上的内存。
  再等片刻,发现psql端传来反馈:



WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
The connection to the server was lost. Attempting reset: Failed.
!>
  此时看看Log,可以看到:background writer (3321) 已经被干掉了,所有连接被重置。



LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (4 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (5 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (5 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (11 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (15 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (23 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (13 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (4 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (5 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  background writer process (PID 3221) was terminated by signal 9: Killed
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
LOG:  all server processes terminated; reinitializing
FATAL:  the database system is in recovery mode
LOG:  database system was interrupted; last known up at 2013-08-30 16:36:42 CST
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  consistent recovery state reached at 0/B7657BD0
LOG:  redo starts at 0/B60FE2B8
LOG:  unexpected pageaddr 0/B044C000 in log file 0, segment 184, offset 4505600
LOG:  redo done at 0/B844B940
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
  各个进程都重新生成了:



[iyunv@server ~]# ps aux | grep post
root      3180  0.0  0.0 105296   660 pts/1    S    16:31   0:00 su - postgres
postgres  3181  0.0  0.0  70304   628 pts/1    S+   16:31   0:00 -bash
postgres  3219  0.0  0.5 113644  6276 pts/1    S    16:32   0:00 /usr/local/pgsql/bin/postgres -D /gao/data
root      3252  0.0  0.0 105296   660 pts/2    S    16:32   0:00 su - postgres
postgres  3253  0.0  0.0  70304   628 pts/2    S    16:32   0:00 -bash
postgres  3285  0.0  0.0  83488  1144 pts/2    S+   16:32   0:00 ./psql
postgres  3348  0.0  0.0 113644   984 ?        Ss   16:40   0:00 postgres: writer process                  
postgres  3349  0.0  0.0 113644   732 ?        Ss   16:40   0:00 postgres: wal writer process              
postgres  3350  0.0  0.1 114336  1756 ?        Ss   16:40   0:00 postgres: autovacuum launcher process     
postgres  3351  0.0  0.0  73332   580 ?        Ss   16:40   0:00 postgres: stats collector process         
root      3361  0.0  0.0  65420   796 pts/3    R+   16:44   0:00 grep post
[iyunv@server ~]#
  回到psql端再看,发现连接已经失效了:



!> \
Invalid command \. Try \? for help.
!> \dt;
You are currently not connected to a database.
!>
  根据向社区提问,据说是因为OS级别的OOM错误发生,所以杀死了Postmaster进程。
  总之,此种情况表明,如果没有对资源消费总量的控制,那么突发性的用户访问会导致崩溃的。
  
  [作者 高健@博客园  luckyjackgao@gmail.com]
  回到上一级页面: PostgreSQL基础知识与基本操作索引页     回到顶级页面:PostgreSQL索引页
  磨砺技术珠矶,践行数据之道,追求卓越价值

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-303642-1-1.html 上篇帖子: Hadoop Ambari 安装 下篇帖子: LoopBack – 开源的,可扩展的 Node.js 框架
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表