设为首页 收藏本站
查看: 746|回复: 0

[经验分享] 谈谈Oracle数据库启动时禁用smon事务恢复

[复制链接]
YunVN网友  发表于 2016-8-13 06:25:11 |阅读模式
数据库在异常宕机之后,数据库再次启动时,smon会进行crash recover之后再进行tx recover,这在alert日志可以清晰的看到smon的操作过程。
但事情发展往往不如人意,试想一下当回滚段出现异常时,数据库势必不能启动,这时候我们往往需要借助一些异常手段,将数据库open。
Oracle的一些内部事件允许设置debug_mode,以便允许在AUM模式下,手工进行回滚段的处理:
alter session set "_smu_debug_mode"=4;
alter rollback segment "_SYSSMU7$" ONLINE;
或者使用隐含参数offline_rollback_segments进行指定回滚段处理:
_offline_rollback_segments
drop rollback segment ‘xxx’
如果回滚段正常,某些恢复可能需要很长的时间,在此期间的一些异常可能导致SMON Crash,进而数据库Instance崩溃。这时候我们可以引进一些事件进行诊断。
如:
禁用smon恢复
event = '10513 trace name context forever,level 2'
跟踪smon进程
event = '10500 trace name context forever, level 1'

测试案例。
不设置event 系统为初始状态
SQL> show parameter event

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
event
并产生大批量脏数据
SQL> insert into testsmon1 select * from testsmon1;

81920 rows created
数据库重启,注意这里使用的是force选项
SQL> startup force
ORACLE instance started.

Total System Global Area  838860800 bytes
Fixed Size                  1262164 bytes
Variable Size             494931372 bytes
Database Buffers          327155712 bytes
Redo Buffers               15511552 bytes
Database mounted.
Database opened.
我们在alert日志里可以看到

ALTER DATABASE OPEN
Fri Dec 17 16:30:49 2010
Beginning crash recovery of 1 threads
parallel recovery started with 3 processes
Fri Dec 17 16:30:49 2010
Started redo scan
Fri Dec 17 16:30:49 2010
Completed redo scan
28142 redo blocks read, 1885 data blocks need recovery
Fri Dec 17 16:30:49 2010
Started redo application at
Thread 1: logseq 66211, block 3
Fri Dec 17 16:30:49 2010
Recovery of Online Redo Log: Thread 1 Group 5 Seq 66211 Reading mem 0
  Mem# 0: /oradata/mcstar/mcstar/redo05.log
Fri Dec 17 16:30:49 2010
Completed redo application
Fri Dec 17 16:30:49 2010
db_recovery_file_dest_size of 10240 MB is 79.77% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Fri Dec 17 16:30:50 2010
Completed crash recovery at
Thread 1: logseq 66211, block 28145, scn 10995135133737
1885 data blocks read, 1885 data blocks written, 28142 redo blocks read
Fri Dec 17 16:30:50 2010
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=29, OS id=16094
Fri Dec 17 16:30:50 2010
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=30, OS id=16111
Fri Dec 17 16:30:50 2010
Thread 1 advanced to log sequence 66212
Thread 1 opened at log sequence 66212
  Current log# 4 seq# 66212 mem# 0: /oradata/mcstar/mcstar/redo04.log
Successful open of redo thread 1
Fri Dec 17 16:30:50 2010
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Dec 17 16:30:50 2010
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Fri Dec 17 16:30:50 2010
SMON: enabling cache recovery
Fri Dec 17 16:30:50 2010
ARC0: Becoming the heartbeat ARCH
Fri Dec 17 16:30:50 2010
Successfully onlined Undo Tablespace 12.
Fri Dec 17 16:30:50 2010
SMON: enabling tx recovery

[ora10g@mcprod bdump]$ more mcstar_smon_15889.trc
/ora10g/app/admin/mcstar/bdump/mcstar_smon_15889.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /ora10g/oracle/product/10.2.0/db_1
System name:    Linux
Node name:      mcprod
Release:        2.6.18-53.el5PAE
Version:        #1 SMP Mon Nov 12 02:55:09 EST 2007
Machine:        i686
Instance name: mcstar
Redo thread mounted by this instance: 1
Oracle process number: 8
Unix process pid: 15889, image: oracle@mcprod (SMON)

*** SERVICE NAME:() 2010-12-17 16:30:51.329
*** SESSION ID:(549.1) 2010-12-17 16:30:51.329
Dead transaction 0x000a.02d.0006936b recovered by 16 server(s)
*** 2010-12-17 16:30:51.332
SMON: Parallel transaction recovery tried

现在我们创建一个测试表格
SQL> select OBJECT_ID from dba_objects where owner='ZHOU' and OBJECT_NAME='T';

OBJECT_ID
----------
   2275996

SQL> declare
  2  begin
  3  for i in 1..100000 loop
  4  insert into t values(i,i+1);
  5  end loop;
  6  end;
  7  /

PL/SQL procedure successfully completed.


SQL> select count(*) from v$bh where objd=2275996;

  COUNT(*)
----------
       434
      
现在将其重启

SQL> startup force
ORACLE instance started.

Total System Global Area  838860800 bytes
Fixed Size                  1262164 bytes
Variable Size             503319980 bytes
Database Buffers          318767104 bytes
Redo Buffers               15511552 bytes
Database mounted.
Database opened.

后台我们看到了smon在进行并行恢复,对象2275996的block已经在内存中,并以current状态存在

SQL> select status,count(*) from v$bh  where objd=2275996 group by status;

STATUS    COUNT(*)
------- ----------
xcur           212

我们将其从内存中刷出
SQL> alter system flush buffer_cache;

System altered.
可以看到都为内存free状态
SQL> select status,count(*) from v$bh  where objd=2275996 group by status;

STATUS    COUNT(*)
------- ----------
free           392

现在我们查询表格T,数据会从哪里获取呢?从业务表空间还是回滚表空间呢?

SQL> alter session set db_file_multiblock_read_count=1;

Session altered.

SQL> alter session set events '10200 trace name context forever, level 1';

Session altered.

SQL> select count(*) from t;

  COUNT(*)
----------
         0
SQL> select status,count(*) from v$bh  where objd=2275996 group by status;

STATUS    COUNT(*)
------- ----------
xcur           425
free           275

观察后台trc进程可以看到,数据库都从业务表空间获取,可见smon进程在数据库启动时已经完成tx恢复。

现在测试的重点来了,引进一个事件
SQL> alter system set event='10513 trace name context forever,level 2' scope=spfile;

System altered.

继续在业务表空间创建大量脏数据

SQL> declare
  2  begin
  3  for i in 1..100000 loop
  4  insert into t values(i,i+1);
  5  end loop;
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> select status,count(*) from v$bh  where objd=2275996 group by status;

STATUS    COUNT(*)
------- ----------
xcur           368

将系统重启,可以看到event已经生效
SQL> startup force
ORACLE instance started.

Total System Global Area  838860800 bytes
Fixed Size                  1262164 bytes
Variable Size             528485804 bytes
Database Buffers          293601280 bytes
Redo Buffers               15511552 bytes
Database mounted.
Database opened.
SQL> show parameter event

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
event                                string      10513 trace name context forev
                                                 er,level 2

进一步观察alert日志,依然出现SMON: enabling tx recovery字段。
Fri Dec 17 17:02:23 2010
ALTER DATABASE OPEN
Fri Dec 17 17:02:23 2010
Beginning crash recovery of 1 threads
parallel recovery started with 3 processes
Fri Dec 17 17:02:23 2010
Started redo scan
Fri Dec 17 17:02:23 2010
Completed redo scan
91999 redo blocks read, 2752 data blocks need recovery
Fri Dec 17 17:02:23 2010
Started redo application at
Thread 1: logseq 66217, block 3
Fri Dec 17 17:02:23 2010
Recovery of Online Redo Log: Thread 1 Group 5 Seq 66217 Reading mem 0
  Mem# 0: /oradata/mcstar/mcstar/redo05.log
Fri Dec 17 17:02:24 2010
Completed redo application
Fri Dec 17 17:02:24 2010
Completed crash recovery at
Thread 1: logseq 66217, block 92002, scn 10995135225524
2752 data blocks read, 2752 data blocks written, 91999 redo blocks read
Fri Dec 17 17:02:24 2010
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=29, OS id=467
Fri Dec 17 17:02:24 2010
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=30, OS id=469
Fri Dec 17 17:02:24 2010
Thread 1 advanced to log sequence 66218
Thread 1 opened at log sequence 66218
  Current log# 4 seq# 66218 mem# 0: /oradata/mcstar/mcstar/redo04.log
Successful open of redo thread 1
Fri Dec 17 17:02:24 2010
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Dec 17 17:02:24 2010
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Fri Dec 17 17:02:24 2010
ARC1: Becoming the heartbeat ARCH
Fri Dec 17 17:02:24 2010
SMON: enabling cache recovery
Fri Dec 17 17:02:24 2010
Successfully onlined Undo Tablespace 12.
Fri Dec 17 17:02:24 2010
SMON: enabling tx recovery
Fri Dec 17 17:02:24 2010
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Fri Dec 17 17:02:24 2010
Streams APPLY A001 started with pid=31, OS id=471
Fri Dec 17 17:02:24 2010
Streams APPLY A002 started with pid=32, OS id=473
Fri Dec 17 17:02:24 2010
Starting background process QMNC
QMNC started with pid=33, OS id=475
Fri Dec 17 17:02:25 2010
Completed: ALTER DATABASE OPEN                                               

但是没有相应trace文件产生,如果要产生trace文件,我们可以加另外一个event 10500,重复上述过程。
SQL> alter system set event='10513 trace name context forever,level 2','10500 trace name context forever, level 1' scope=spfile;

System altered.
数据库重启之后
SQL> startup force
ORACLE instance started.

Total System Global Area  838860800 bytes
Fixed Size                  1262164 bytes
Variable Size             536874412 bytes
Database Buffers          285212672 bytes
Redo Buffers               15511552 bytes
Database mounted.
Database opened.
可以看到跟踪文件smon的状态
[ora10g@mcprod bdump]$ more mcstar_smon_11666.trc
/ora10g/app/admin/mcstar/bdump/mcstar_smon_11666.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /ora10g/oracle/product/10.2.0/db_1
System name:    Linux
Node name:      mcprod
Release:        2.6.18-53.el5PAE
Version:        #1 SMP Mon Nov 12 02:55:09 EST 2007
Machine:        i686
Instance name: mcstar
Redo thread mounted by this instance: 1
Oracle process number: 8
Unix process pid: 11666, image: oracle@mcprod (SMON)

*** SERVICE NAME:() 2010-12-17 17:08:12.897
*** SESSION ID:(549.1) 2010-12-17 17:08:12.897
*** 2010-12-17 17:08:12.897
SMON: Event 10513 is level 2, trans recovery disabled.
kglScanDependencyHandles4Unpin():
  cumscan=1 cumupin=2 time=64424509440 upinned=0
*** 2010-12-17 17:08:12.976
SMON: Event 10513 is level 2, trans recovery disabled.
*** 2010-12-17 17:08:13.245
SMON: Event 10513 is level 2, trans recovery disabled.
*** 2010-12-17 17:08:13.299
SMON: Event 10513 is level 2, trans recovery disabled.
*** 2010-12-17 17:08:13.469
SMON: Event 10513 is level 2, trans recovery disabled.
*** 2010-12-17 17:08:13.477
SMON: Event 10513 is level 2, trans recovery disabled.
*** 2010-12-17 17:08:13.593
SMON: Event 10513 is level 2, trans recovery disabled.

但是alert日志里面显示,这跟trace矛盾。
Fri Dec 17 17:08:12 2010
SMON: enabling tx recovery

但是在数据库打开之后,我们看到内存中并没有对象2275996
SQL> select status,count(*) from v$bh  where objd=2275996 group by status;

no rows selected
观察业务数据从哪里获取呢?
SQL>  alter session set db_file_multiblock_read_count=1;

Session altered.

SQL> alter session set events '10200 trace name context forever, level 1';

Session altered.

SQL> select count(*) from t;

  COUNT(*)
----------
         0

可以看到有cr状态,推断出有部分数据从undo表空间中获取
SQL> select status,count(*) from v$bh  where objd=2275996 group by status;

STATUS    COUNT(*)
------- ----------
xcur           501
cr             408
         
查看跟踪文件,可以看到和cr一样
[ora10g@mcprod udump]$ cat mcstar_ora_26883.trc|grep "uba: 0x008"|wc -l
408

最后将event恢复初始
SQL> alter system set event='' scope=spfile;

System altered.
再次重启系统

可以看到smon将0x000a.02d.00069371和0x0001.021.0006881d作为死事务处理
[ora10g@mcprod bdump]$ more mcstar_smon_14715.trc
/ora10g/app/admin/mcstar/bdump/mcstar_smon_14715.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /ora10g/oracle/product/10.2.0/db_1
System name:    Linux
Node name:      mcprod
Release:        2.6.18-53.el5PAE
Version:        #1 SMP Mon Nov 12 02:55:09 EST 2007
Machine:        i686
Instance name: mcstar
Redo thread mounted by this instance: 1
Oracle process number: 8
Unix process pid: 14715, image: oracle@mcprod (SMON)

*** SERVICE NAME:(SYS$BACKGROUND) 2010-12-17 17:32:48.011
*** SESSION ID:(549.1) 2010-12-17 17:32:48.011
Dead transaction 0x000a.02d.00069371 recovered by 15 server(s)
Dead transaction 0x0001.021.0006881d recovered by 15 server(s)
*** 2010-12-17 17:32:48.048
SMON: Parallel transaction recovery tried

回过头去看之前的跟踪日志,完全印证之前的事务。
[ora10g@mcprod udump]$ cat mcstar_ora_26883.trc|grep 00069371|wc -l
204
You have new mail in /var/spool/mail/ora10g
[ora10g@mcprod udump]$ cat mcstar_ora_26883.trc|grep 0006881d|wc -l
204
[ora10g@mcprod udump]$ cat mcstar_ora_26883.trc|grep "uba: 0x008"|wc -l
408

当然在设了event 10513之后,我们可以从x$ktuxe获取死事务的相关详细,详见
我的另外一篇文章http://itspace.iyunv.com/blog/644509

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-256892-1-1.html 上篇帖子: Oracle基本数据类型存储格式研究(二)—数字类型 下篇帖子: oracle学习笔记五(数据字典及用户管理)
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表