设为首页 收藏本站
查看: 673|回复: 0

[经验分享] Oracle shutdown 过程中 DB hang住 解决方法

[复制链接]

尚未签到

发表于 2016-8-5 11:11:22 | 显示全部楼层 |阅读模式
  


一.故障处理过程

  

  数据库版本:Oracle10.2.0.5.4
  操作系统: Redhat5.4
  
  中午同事对表添加字段,然后造成70多个存储等对象无效,系统不能正常运行,查看了一下同事添加字段的表,并没有锁没有释放。
  尝试编译无效对象,部分可以编译,部分不能编译,对象一直处于编译状态。Kill掉其他的session 后也是无法编译,因为影响业务,决定重启数据库。
  
  --关闭DB:
  [oracle@qs-xezf-db1 ~]$ sqlplus / as sysdba
  SQL*Plus: Release 10.2.0.5.0 - Productionon Thu Mar 29 13:23:14 2012
  Copyright (c) 1982, 2010, Oracle. All Rights Reserved.
  
  Connected to:
  Oracle Database 10g Enterprise EditionRelease 10.2.0.5.0 - 64bit Production
  With the Partitioning, OLAP, Data Miningand Real Application Testing options
  
  SYS@xezf(qs-xezf-db1)> alter systemswitch logfile;
  System altered.
  
  SYS@xezf(qs-xezf-db1)> alter systemswitch logfile;
  System altered.
  
  SYS@xezf(qs-xezf-db1)> alter systemswitch logfile;
  System altered.
  --个人习惯,关闭db前先切换下日志。
  
  SYS@xezf(qs-xezf-db1)> shutdownimmediate
  --一直挂在这
  
  --跟踪LOG
  [oracle@qs-xezf-db1 bdump]$ tail -falert_xezf.log
  Stopping background process QMNC
  Thu Mar 29 13:24:08 CST 2012
  Stopping background process CJQ0
  Thu Mar 29 13:24:10 CST 2012
  Stopping background process MMNL
  Thu Mar 29 13:24:11 CST 2012
  Stopping background process MMON
  License high water mark = 201
  Thu Mar 29 13:24:12 CST 2012
  Job queue slave processes stopped
  Thu Mar 29 13:29:11 CST 2012
  Active call for process 11403 user 'oracle'program 'oracle@qs-xezf-db1'
  System State dumped to trace file/u01/app/oracle/admin/xezf/udump/xezf_ora_13569.trc
  Thu Mar 29 13:29:29 CST 2012
  SHUTDOWN: waiting for active calls tocomplete.
  Thu Mar 29 13:44:15 CST 2012
  MMNL absent for 1207 secs; Foregroundstaking over
  --等了20分钟的时间,最后还是挂住了,在这个状态等了10分钟,还是不能结束。
  
  决定强制关闭DB:
  [oracle@qs-xezf-db1 ~]$ sqlplus / as sysdba
  SQL*Plus: Release 10.2.0.5.0 - Productionon Thu Mar 29 13:53:13 2012
  
  Copyright (c) 1982, 2010, Oracle. All Rights Reserved.
  
  Connected.
  
  SQL> shutdown abort
  ORACLE instance shut down.
  
  
  --启动DB:
  SQL> startup
  ORACLE instance started.
  
  Total System Global Area 5251268608 bytes
  Fixed Size 2103600 bytes
  Variable Size 1275070160 bytes
  Database Buffers 3959422976 bytes
  Redo Buffers 14671872 bytes
  Database mounted.
  Database opened.
  SQL>
  
  --在编译无效对象,系统恢复正常。
  
  
  


二.分析过程



2.1 分析system state dump

  在关闭DB的时候,自动生成了一个system dump 文件,使用ass.awk 分析:
  
  Oracle 使用ass.awk 工具查看system state dump 说明
  http://blog.csdn.net/tianlesoftware/article/details/7237729
  
  [oracle@qs-xezf-db1 src]$ awk -f ass109.awk/u01/app/oracle/admin/xezf/udump/xezf_ora_13569.trc
  
  Starting Systemstate 1
  .............................................
  Ass.Awk Version 1.0.9 - Processing/u01/app/oracle/admin/xezf/udump/xezf_ora_13569.trc
  
  System State 1
  ~~~~~~~~~~~~~~~~
  1:
  2:waiting for 'pmon timer'
  3:waiting for 'rdbms ipc message'
  4:waiting for 'rdbms ipc message'
  5:waiting for 'rdbms ipc message'
  6:waiting for 'rdbms ipc message'
  7:waiting for 'rdbms ipc message'
  8:last wait for 'rdbms ipc message'
  9:last wait for 'smon timer'
  10: waiting for 'rdbms ipc message'
  14:
  15:
  17: waiting for 'rdbms ipc message'
  18: waiting for 'rdbms ipc message'
  23: waiting for 'SQL*Net message fromclient'
  Cmd: PL/SQL Execute
  26: last wait for 'ksdxexeotherwait'
  38: waiting for 'SQL*Net message fromclient'
  Cmd: PL/SQL Execute
  39: waiting for 'SQL*Net message fromclient'
  40: waiting for 'SQL*Net message fromclient'
  41: waiting for 'SQL*Net message fromclient'
  75: waiting for 'SQL*Net message fromclient'
  88: waiting for 'SQL*Net message fromclient'
  Cmd: PL/SQL Execute
  95: waiting for 'SQL*Net message fromclient'
  96: waiting for 'SQL*Net message fromclient'
  97: waiting for 'SQL*Net message fromclient'
  99: waiting for 'SQL*Net message fromclient'
  104:waiting for 'SQL*Net message fromclient'
  Cmd: PL/SQL Execute
  106:waiting for 'SQL*Net message fromclient'
  107:waiting for 'SQL*Net message fromclient'
  108:waiting for 'SQL*Net message fromclient'
  117:waiting for 'SQL*Net message fromclient'
  Cmd: Update
  129:waiting for 'SQL*Net message fromclient'
  Cmd: Update
  130:waiting for 'SQL*Net message fromclient'
  133:waiting for 'SQL*Net message fromclient'
  138:waiting for 'SQL*Net message fromclient'
  140:last wait for 'db file sequential read'(1,974b,1)
  Cmd: Insert
  154:waiting for 'SQL*Net message fromclient'
  Cmd: PL/SQL Execute
  159:waiting for 'SQL*Net message fromclient'
  160:waiting for 'SQL*Net message fromclient'
  162:waiting for 'SQL*Net message fromclient'
  171:waiting for 'SQL*Net message fromclient'
  196:waiting for 'SQL*Net message fromclient'
  204:waiting for 'SQL*Net message fromclient'
  Cmd: PL/SQL Execute
  216:waiting for 'SQL*Net message fromclient'
  Cmd: PL/SQL Execute
  220:waiting for 'SQL*Net message from client'[Latchfor]
  Blockers
  ~~~~~~~~
  
  Above is a list of all the processes. If they are waiting for a resource
  then it will be given in square brackets. Below is a summary of the
  waited upon resources, together with the holder of that resource.
  Notes:
  ~~~~~
  o A process id of '???' implies that the holder was not found in the
  systemstate.
  
  Resource Holder State
  Latch for ??? Blocker
  
  Object Names
  ~~~~~~~~~~~~
  Latch for SHUTDOWN: waiting foractive calls to complete.
  
  
  4767 Lines Processed.
  [oracle@qs-xezf-db1 src]$
  
  


2.2 分析:SHUTDOWN: waiting for active calls tocomplete.

  在此次shutdown过程中,在这一步停了近15分钟。
  
  Thu Mar 29 13:29:11 CST 2012
  Active call for process 11403 user 'oracle'program 'oracle@qs-xezf-db1'
  System State dumped to trace file/u01/app/oracle/admin/xezf/udump/xezf_ora_13569.trc
  Thu Mar 29 13:29:29 CST 2012
  SHUTDOWN: waiting for active calls tocomplete.
  Thu Mar 29 13:44:15 CST 2012
  MMNL absent for 1207 secs; Foregroundstaking over
  
  MOS上解释如下:
  Alert Log: Shutdown Waiting for ActiveCalls to Complete [ID 1039389.6]
  

2.2.1 Problem Description
  You areattempting to shut down the database and the database hangs. The
  alert log contains the following message:
   --我们尝试关闭DB时,数据库hang住了,然后alert log 就会出现如下信息:
  SHUTDOWN: Waiting for active calls to complete
  
  
  There are noother error messages in the alert log.

并且没有其他的错误信息了。


2.2.2 Solution Description: --解决方法描述


  --------------------------------------------------------
  Caveat:This note is intended for Unix platforms primarily, but we shouldaddress this issue on Windows platforms as well. On Windows, stopping the service is a validworkaround for these errors.
  --------------------------------------------------------
  警告:这个方式针对Unix 平台,但是windows 平台同样适用。
  
  Locate and kill any client connections tothe database at the Unix level, as follows:
  解决方法是在Unix 系统级别查找并kill 掉任何客户端的连接,具体方法如下:

  1. Locate any client connections to thedatabase using ps, and grep for anyprocesses belonging to this .
  
  Example: ps -ef | grep SID
  
  2. Look for processes that include a'Local=No' designation.
  Example:osupport 6235 1 0 Nov 24 0:01 oracleSID (LOCAL=NO)
  
  更多内容,参考:
  Oracle 服务器 进程中的 LOCAL=NO 和 LOCAL=YES
  http://blog.csdn.net/tianlesoftware/article/details/5675404
  
  3. Kill the Unix process(es) with the'Local=No' designation.
  --kill Local=no 的进程
  Example: Kill -9 6235


2.2.3 Explanation –造成这个现象的原因

  The database iswaiting for pmon to clean up processes, but pmon is unable to clean them. Theclient connections to the server are causing the shutdown immediate or normalto hang. Killing them allows pmon toclean up and release the associated Oracle processes and resources.
  --数据库等待PMON 进程清理这些LOCAL=NO的进程,但是PMON 进程不能清理他们,这些进程就会导致数据库shutdown 时hang住,kill 掉这些进程,然后让PMON 来清理和释放其他的数据库进程和资源,完成数据库关闭操作。
  
  关于PMON进程的更多内容,参考:
  Oracle 后台进程 说明
  http://blog.csdn.net/tianlesoftware/article/details/5587788
  
  
  What resources are we talking about?
  --组成DB 关闭的资源包括:
  
  1) Any non committed transactions must berolled back
  2) Any temporary space (sort segments /lobs / session temporary tables) must be freed
  3) The session itself and any associatedmemory consumed by the session.
  4) Internal locks / enqueues must becleaned up
  
  
  Often Oracle(SMON or PMON depending on whether Shared Server is used) will wait for the OSto terminate the process(es) associated with the session. I the OS never returns, or fails to terminatethem, then the instance shutdown will hang with this message (Shutdown Waitingfor Active Calls to Complete)
  
  Other meansexist to achieve a quick shutdown, as outlined in Note386408.1 - What Is The Fastest Way To Cleanly Shutdown An Oracle Database?
  
  Note that inE-Business Environments, the same messages can be produced if the databaseshutdown process starts before the concurrent manager stops in the applicationserver .
  


2.3 分析:MMNL absent for 1207 secs; Foregroundstaking over

  
  在我遇到的这次故障中,我等待了10分钟,依旧挂在这,之后就shutdown Abort db了。
  
  MOS 说明:
  "MMNL absent for %u secs; Foregroundstaking over" Messages in Alert.log [ID 465891.1]
  
  When there is ahang situation for a long period of time, oracle writesthis message tothealert.log:"MMNL absent for %u secs; Foregrounds takingover".
   --当数据库hang住一段时间后,Oracle 就会写这个信息到alert log
  
  Thu Nov 1 15:20:56 2007
MMNL absent for 3805 secs; Foregrounds taking over
Thu Nov 1 15:58:13 2007
Starting ORACLE instance (normal)
  
  If this is ahang situation duringshutdown (after MMNL is stopped) or during databaseopen (when MMNL is not yet started), messages are written to the alert.logevery 5 minutes.
  --在数据库关闭过程中(MMNL 已经停止)或者在数据库open时(MMNL 还没有启动),这个消息会每隔5分钟写入一次alert log
  
  Any completehang situations which take a long time trigger these messages.
  
  2.3.1 Cause –导致原因
  These messagesare for informational purposes only to indicate a hang situation which lasted along period of time and MMNL is unable to run its job at that moment. When thehang lasts for several hours before the instance is completely stopped/killed,the tail of the alert.log may flood with these messages. Although the messagetext contains MMNL process, in this case MMNL is actuallythe victim of thehang issue rather than the cause.
  --这个信息唯一的作用是用来说明数据库已经hang住了,并且MMNL 也不能执行它的job
  
  
  2.3.2 Solution --解决方法
  In case of ahang taking a very long time, there is no way to prevent these messages to bewritten to alert.log. Hang situation must be resolved to stop the messages.Starting with 11.1 version, messages are written to a trace file rather thanthe alert.log to prevent messages to be written to alert.log repeatedly manytimes.
  --在某些情况下,可能会hang住很长时间,并且没有方法阻止数据库往alert log里写入这些信息。从Oracle 11gR1开始,该消息将写入tracefile,而不是alert log从而避免重复的往alert log里写信息。
  
  


三.小结

  在我这次遇到的故障,有2个问题:
  


3.1 添加字段导导致对象无效

  因为系统正在运行,修改的对象也是核心表,所以造成了很多问题,这个归根到底是规范问题,不多说。这里为了尽量减少时间,我重启了DB,但是重启过程导中导致数据库hang住了。
  
  这里可选的操作有hanganalyze,system dump,检查锁,等待情况:
  
  Oracleoradebug 命令 使用说明
  http://blog.csdn.net/tianlesoftware/article/details/6525628
  
  OracleHANGANALYZE 功能诊断 DBhanging
  http://blog.csdn.net/tianlesoftware/article/details/6321961
  
  Oracle 一次 锁表 处理小记
  http://blog.csdn.net/tianlesoftware/article/details/6679014
  
  Oracle 查看 对象 持有锁的情况


  http://blog.csdn.net/tianlesoftware/article/details/6822321
  
  
  3.2 db 关闭过程中hang
  我在操作中采用的解决方法是shutdown abort,MOS上提供的解决方法是在操作系统级别kill 掉LOCAL=NO的进程,因为这些进程不能被PMON进程清理导致shutdown 过程hang住。
  
  
  
  
  

  -------------------------------------------------------------------------------------------------------
  版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!
  Skype:  tianlesoftware
  Email:  tianlesoftware@gmail.com
  Blog: http://www.tianlesoftware.com
  Weibo:  http://weibo.com/tianlesoftware
  Twitter: http://twitter.com/tianlesoftware
  Facebook: http://www.facebook.com/tianlesoftware
  Linkedin: http://cn.linkedin.com/in/tianlesoftware
  
  
  -------加群需要在备注说明Oracle表空间和数据文件的关系,否则拒绝申请----
  DBA1 群:62697716(满); DBA2 群:62697977(满)DBA3 群:62697850(满)
  DBA 超级群:63306533(满); DBA4 群:83829929 DBA5群: 142216823
  DBA6 群:158654907 DBA7 群:172855474 DBA总群:104207940

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-253222-1-1.html 上篇帖子: oraToolKit之一安装oraToolKit并检测安装oracle环境 下篇帖子: oracle 11g PL/SQL Programming学习十五
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表